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Preface 



Recent years have witnessed the appearance of new paradigms for designing 
distributed applications where the application components can be relocated dy- 
namically across the hosts of the network. This form of code mobility lays the 
foundation for a new generation of technologies, architectures, models, and ap- 
plications in which the location at which the code is executed comes under the 
control of the designer, rather than simply being a configuration accident. 

Among the various flavors of mobile code, the mobile agent paradigm has 
become particularly popular. Mobile agents are programs able to determine au- 
tonomously their own migration to a different host, and still retain their code 
and state (or at least a portion thereof). Thus, distributed computations do not 
necessarily unfold as a sequence of requests and replies between clients and re- 
mote servers, rather they encompass one or more visits of one or more mobile 
agents to the nodes involved. 

Mobile code and mobile agents hold the potential to shape the next genera- 
tion of technologies and models for distributed computation. The first steps of 
this process are already evident today: Web applets provide a case for the least 
sophisticated form of mobile code, Java-based distributed middleware makes in- 
creasing use of mobile code, and the first commercial applications using mobile 
agents are starting to appear. 

This volume contains the proceedings of the Fifth International Conference 
on Mobile Agents (MA 2001). MA 2001 took place in Atlanta, Georgia, USA, at 
the Georgia Center for Advanced Telecommunications Technology (GCATT), on 
December 2-4, 2001. The ambitious goal of MA 2001 was to gather researchers 
and practitioners from all over the world and shed some light on the open issues 
related to the exciting research topic of code mobility. 

The first conference in this series was held in 1997 in Berlin, and since then 
it has been, by number of attendees and by quality and breadth of the rese- 
arch disseminated, among the top events for the community of researchers and 
practitioners interested in mobile code and mobile agents. The previous two con- 
ferences were held together with the International Symposium on Agent Systems 
and Applications (ASA) as joint ASA/MA events that aimed at gathering resear- 
chers interested in all the flavors of agent systems, e.g., including also intelligent 
and non-mobile agents. Although these joint events were very successful, MA 
2001 was presented as a stand-alone event, entirely focused on the original tar- 
get of mobile code and mobile agents. Our goal with this and future events is to 
strengthen the MA conference as the international venue at which the best and 
latest results in the topics of mobile code and mobile agents are disseminated 
and discussed. 

The conference received 75 submissions from authors all over the world. 
The GyberGhair system (www.cyberchair.org) greatly simplified the submis- 
sion and review process. The Program Gommittee, composed of 20 of the most 
distinguished researchers in code mobility, reviewed all of the papers carefully. 
Each paper was assigned to at least three reviewers - four in the case of papers 
authored by Program Gommittee members. Reviewers were asked to declare in 




VI 



Preface 



advance potential conflicts of interest, to allow a proper assignment of papers 
and ensure fair reviews. Moreover, this information was used at the Program 
Committee meeting, that took place in Milan at the end of May, where revie- 
wers with a conflict of interest on a paper were asked to leave the room during 
the related discussion. After a full-day meeting, the Program Committee selected 
the 18 papers included in the technical program. 

In addition to these papers, we were honored that two distinguished experts 
accepted our invitation to give keynote presentations. Fred Schneider (Cornell 
University, USA) shared his views about the past, present, and future of mobile 
agent research, while Aleta Ricciardi (Valaran Corporation, USA) reported on 
her first-hand experience in applying code mobility within a real-world indu- 
strial context. The program was completed by a “Posters and Research Demos” 
session, and by four tutorials by leading experts in the held. 

Conferences are the result of the concerted efforts of several people. First of 
all, I would like to express, personally and on behalf of the rest of the Organizing 
Committee, my appreciation to the authors of the submitted papers, and since- 
rely thank the members of the Program Committee and the external reviewers 
for their fundamental contribution to ensuring the quality of this conference. I 
would also like to thank the General Chair of MA 2001, David Kotz, and the 
rest of the Organizing Committee for their work in making this event a success. 
Finally, I would like to acknowledge and thank the IEEE Technical Committee 
on the Internet and the IEEE Computer Society for sponsoring the event, and 
Nokia and Georgia Tech College of Engineering for supporting it. 
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General Mixed Integer Programming: 
Computational Issues for Branch-and-Cut 

Algorithms 



Alexander Martin 

Darmstadt University of Technology, Department of Mathematics, 
Schlossgartenstr. 7, D-64289 Darmstadt, Germany 



Abstract. In this paper we survey the basic features of state-of-the-art 
branch-and-cut algorithms for the solution of general mixed integer pro- 
gramming problems. In particular we focus on preprocessing techniques, 
branch-and-bound issues and cutting plane generation. 



1 Introduction 

A general mixed integer program (MIP) is the problem of optimizing a linear 
objective function subject to a system of linear constraints, where some or all 
of the variables are required to be integer. The solution of general mixed inte- 
ger programs is one of the challenging problems in discrete optimization. The 
problems that can be modeled as mixed integer programs arise, for instance, in 
science, technology, business, and environment, and their number is tremendous. 
It is therefore no wonder that many solution methods and codes exist for the 
solution of mixed integer programs, and not just a few of them are business 
oriented, see m for a survey on commercial linear and integer programming 
solvers. 

One of the most successful methods to solve mixed integer programming 
problems are branch-and-cut algorithms. In Section El we outline the principle 
structure of a branch-and-cut algorithm. The main ingredients are preprocessing, 
the solution of the underlying linear programs, branch-and-bound issues, cut 
generation, and primal heuristics. Our aim is to give sort of a survey on the 
features that state-of-the-art branch-and-cut solvers for mixed integer programs 
include. Most of the issues presented are pretty much standard, but our intention 
is to use this paper more as a text book and to give the unfamiliar reader of this 
subject an impression on how mixed integer programs are solved today. In detail 
we focus on preprocessing in Section El branch-and-bound issues in Section U] 
and cut generation in Section 0 

The software package that we use as a basic reference in this paper is SIP, 
which is currently developed at our institute and ZIB M- As mentioned most of 
the described issues are common to basically all state-of-the-art solvers and there 
are many other comparable codes that contain many of the described features. 
Among them are in particular ABACUS, developed at the University of Cologne 
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[I66I40| [ — >• Elf/Gutwenger/Jiinger/Rinaldi], bc-opt, developed at CORE |T9] . 
CPLEX, developed at ILOG [39], MIPO, developed at Columbia University [6], 
MINTO, developed at Georgia Institute of Technology |^, SYMPHONY, developed 
at Cornell University and Lehigh University [ — > Ladanyi/Ralphs/Trotter], and 
XPRESS-MP, developed at DASH [21]. 

It is common to use the library Miplib [10] as test set to evaluate certain 
features of a MIP code. Miplib is a collection of real-world mixed integer pro- 
gramming problems. From time to time we will refer to some instances from this 
library to explain certain phenomena. However, we will not give computational 
results here. The reader interested in concrete running times will find them, for 
instance, in |1 1 11 9l45f49j . 

Of course, branch-and-cut algorithms are not the only successful way to 
solve general mixed integer programs. For an excellent survey on alternative 
approaches, including test sets, Gomory’s group approach and basis reduction, 
see jl]. 



2 Branch-and-Cut Algorithms 

In this section we sketch the main ideas of a branch-and-cut algorithm. More 
details and references on this subject can be found in the annotated bibliography 
m. Suppose we want to solve a mixed integer program 

min c^a; ... . 

s.t. Ax <b, 

where A G c G Q", b G Q™; the variables Xi{i = 1,... ,n) might 

be binary {xi G {0,1}), integer {xi G Z), or continuous {xi G M). Let 
Pip = conv{a: G M" : x is feasible for ©}• The first step of the algorithm is 
to consider a relaxation of m by choosing a set P' C K" with Pip C P' and 
to optimize the linear objective function over P'. For example, this relaxation 
might be the linear programming relaxation minjc^x : Ax < 6} or a semidefi- 
nite relaxation. We only consider linear relaxations, hence, the set P' is always 
a polyhedron. 

Let X be an optimal solution for the linear relaxation. If x is integer and all 
inequalities of Ax < b are satisfied by x, we have found an optimal solution for 
m- Otherwise, there exists a hyperplane (x G M" : a^x = a] such that a^x > a 
and Pip C |x G M” : a^x < a}. Such a hyperplane is called a cutting plane. 
The problem of finding such a hyperplane is called the separation problem. More 
precisely, 

given X G K". Decide, whether x G Pip. If not, find some valid inequality 
a^x < a for Pip such that a^x > a. 

It is well known that the separation problem for Pip and the optimization prob- 
lem minjc^x : x G Pip} are polynomially equivalent, see |Hmrn . Sometimes, 
the separation problem is restricted to a certain class of inequalities, in which 
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case we are searching for a violated inequality of that class. If we are able to 
find such a cutting plane, we can strengthen the relaxation and continue, for 
details see Section 0 This process is iterated until x is a feasible solution or 
no more violated inequalities are found. In the latter case this so-called cutting 
plane phase is embedded into an enumeration scheme. This is commonly done by 
picking some fractional variable Xi that must be binary or integer and creating 
two subproblems, one where one requires Xi > \xi\, and one where Xi < [xij, 
see also the discussions in Section E The following algorithm summarizes the 
whole procedure. 

Algorithm 1. (Branch-and-Cut Algorithm) 

(1) Let L be a list of unsolved problems. Initialize L with (1). 

(2) Repeat 

(3) Choose a problem 77 from L and delete it from L. 

(4) Repeat (iterate) 

(5) Solve the (linear) relaxation of 77. Let x be an optimal solution. 

(6) If x is feasible for 77, 77 is solved; goto (10). 

(7) Look for violated inequalities and add them to the LP. 

(8) Until there are no violated inequalities 

(9) Split 77 into subproblems and add them to L. 

(10) Until Tv = 0. 

(11) Print the optimal solution. 

(12) STOP. 

The list L is usually organized as a binary tree, the so-called branch-and- 
bound tree. Each (sub)problem 77 corresponds to a node in the tree, where the 
unsolved problems are the leaves of the tree and the node that corresponds to 
the entire problem (P is the root. In the remainder of this section we discuss 
some issues that can be found in basically every state-of-the-art branch-and-cut 
implementation. 

LP-Management. We assume that the reader is familiar with linear program- 
ming techniques. A comprehensive treatment of this subject can be found in [m 
ES]. The method that is commonly used to solve the LPs within a branch-and-cut 
algorithm is the dual simplex algorithm, because an LP basis stays dual feasi- 
ble when adding cutting planes. There are fast and robust linear programming 
solvers available, see, for instance, [MED. 

Nevertheless, one major aspect in the design of a branch-and-cut algorithm 
is to control the size of the linear programs. To this end, inequalities are often 
assigned an “age” (at the beginning the age is set to 0). Each time the inequal- 
ity is not tight at the current LP solution, the age is increased by one. If the 
inequality gets too old, i. e., the age exceeds a certain limit, the inequality is 
eliminated from the LP. The value for this “age limit” varies from application 
to application. 

Another issue of LP-management concerns the questions: When should an 
inequality be added to the LP? When is an inequality considered to be “vio- 
lated”? And, how many and which inequalities should be added? The answers 
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to these questions again depend on the application. It is clear that one always 
makes sure that no redundant inequalities are added to the linear program. 

A commonly used data structure in this context is the pool. Violated inequal- 
ities that are added to the LP are stored in this data structure. Also inequalities 
that are eliminated from the LP are restored in the pool. Reasons for the pool are 
to reconstruct the LPs when switching from one node in the branch-and-bound 
tree to another and to keep inequalities that were “expensive” to separate for 
an easier excess in the ongoing solution process. 

Heuristics. Raising the lower bound using cutting planes is one important 
aspect in a branch-and-cut algorithm, finding good feasible solutions early to 
enable fathoming of branches of the search-tree is another. Primal heuristics 
strongly depend on the application. A very common way to find feasible solutions 
for general mixed integer programs is to “plunge” from time to time at some 
node of the branch-and-bound tree, i. e., to dive deeper into the tree and look 
for feasible solutions. This plunging is done by alternatingly rounding/fixing 
some variables and solving linear programs, until all variables are fixed, the 
LP is infeasible, a feasible solution has been found, or the LP value exceeds 
the current best solution. This rounding heuristic can be detached from the 
regular branch-and-bound enumeration phase or considered within the global 
enumeration phase. The complexity and the sensitivity to the change of the LP 
solutions influences the frequency in which the heuristics are called. Some more 
information hereto can be found, for instance, in I49I19I1H . 

Reduced Cost Fixing. The idea is to fix variables by exploiting the reduced 
costs of the current optimal LP solution. Let z = c^x be the objective function 
value of the current LP solution, be an upper bound on the value of the 
optimal solution, and d = the corresponding reduced cost vector. 

Consider a non-basic variable Xj of the current LP solution with finite lower and 
upper bounds U and Ui, and non-zero reduced cost di. Set <5 = rounded 

down in case Xj is a binary or an integer variable. Now, if Xi is currently at its 
lower bound k and U + S < Ui, the upper bound of Xi can be reduced to h + S. In 
case Xi is at its upper bound Ui and Ui — S > U, the lower bound of variable Xi 
can be increased to Ui — 6. In case the new bounds h and Ui coincide, the variable 
can be fixed to its bounds and removed from the problem. This strengthening 
of the bounds is called reduced cost fixing. It was originally applied for binary 
variables [2D], in which case the variable can always be fixed if the criterion 
applies. There are problems where by the reduced cost criterion many variables 
can be fixed, see, for instance, m- Sometimes, further variables can be fixed 
by logical implications, for example, if some binary variable Xi is fixed to one 
by the reduced cost criterion and it is contained in an SOS constraint (i. e., a 
constraint of the form '^j^jXj < 1 with non-negative variables Xj, j S J), all 
other variables in this SOS constraint can be fixed to zero. 

Enumeration Aspects. In our description of a branch-and-cut algorithm we 
left the questions open which problem to choose in Step (3) and how to split the 
problem in Step (9). We discuss these issues in detail in Section [H 
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3 Preprocessing 

Before entering the branch-and-cut phase as outlined in Section |2] there is usu- 
ally a preprocessing step prefixed. Preprocessing aims at eliminating redundant 
information from the problem formulation given by the user and simultaneously 
tries to strengthen the formulation by logical implications. Preprocessing can be 
very effective and sometimes it might not be possible to solve certain problems 
without a good preprocessing. This includes, for instance, Steiner tree problems 
m or set partitioning problems m- Typically, preprocessing is applied only 
once at the beginning of the solution procedure, but sometimes it pays to run 
the preprocessing routine more often on different nodes in the branch- and-bound 
phase, see, for instance, muij. There is always the question of the break even 
point between the running time for preprocessing and the savings in the solu- 
tion time for the whole problem. There is no unified answer to this question. 
It depends on the individual problem, when intensive preprocessing pays and 
when not. In the following we discuss some preprocessing options and ways to 
implement them. Most of these options are incorporated in our code SIP and 
are drawn from 

We extend our definition of a mixed integer program in © slightly and 
consider it in the following more general form: 



where M, N, and C are finite sets with N and C disjoint, A G ^mx{nuC) ^ c, Z, u G 
& G K.*^. If some variable Xi, i G N, is binary we have k = 0 and Ui = l.li 
some variable Xj has no upper or lower bound, we assume that Ij = —oo or Uj = 
-poo. Again we define Pip = convja; G M" : x is feasible for ©}• Furthermore, 
we denote by G {<,=,>} the sign of row i, i. e., d2|) reads minjc'^a; : Ax sb, I < 
x < u, x G X In order to avoid too many subcases in the following 

discussion we assume without loss of generality that there are no “greater than 
or equal” inequalities, i.e., Si G {<,=}. We consider the following cases: 

Empty Rows. Suppose there is some row i with no non-zero entry. If 



min <P" X 




I < X < u 



xGZ^ X RC, 




the problem is infeasible, otherwise row i can be removed. 
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Empty/Infeasible/Fixed Columns. For all columns j check the following: 
If 

Ij > Uj, 

the problem is infeasible. If 

Uj = Ij, 

fix column j to its lower (or upper) bound, update the right-hand side, and 
delete j from the problem. 

Suppose some column j has no non-zero entry. If 
{Uj= oo j i Cj < 0 j 

the problem is unbounded (or infeasible in case no feasible solution exists). 
Otherwise, if 

Ij > —oo ^ f Cj > 0 ^ f 1 

Uj < oo > and < < 0 > fix column j to \ Uj / . 

-Ij = Uj = oo } i O = 0 J [ 0 J 

Parallel Rows. Suppose we are given two rows Ai.x Si bi and Aj.x Sj bj. Row i 

and j are called parallel if there is some a S R such that aAi. = Aj.. The 

following situations might occur: 

1. Conflicting constraints: 

a) Si = ‘=’, Sj = ‘=’, and abi yf bj 

b) Si = ‘=’, Sj = ‘<’, and abi > bj 

c) Si = ‘<’, Sj = ‘<’, and abi > bj {a < 0) 

In any of these cases the problem is infeasible. 

2. Redundant constraints: 

a) Si = ‘=’, Sj = ‘=’, and abi = bj 

b) Si = ‘=’, Sj = ‘<’, and abi < bj 

c) Si = ‘<’, Sj = ‘<’, and abi < bj (a > 0) 

d) Si = ‘<’, Sj = ‘<’, and abi > bj {a > 0) 

In the first three cases row j is redundant, in dMl) row i. 

3. Range constraints: 

a) Si = ‘<’, Sj = ‘<’, and abi = bj {a < 0) 

The two inequalities can be aggregated into one equation. 

b) Si = ‘<’, Sj = ‘<’, and abi < bj {a < 0) 

In this case both inequalities can be aggregated into one range con- 
straint of the form Aj.x + u = bj with 0 < u < 6,- — — . 

^ ^ — — ''a 

The question remains how to find parallel rows. Tomlin and Welsh [67] de- 
scribe an efficient procedure, when the matrix A is stored columnwise, and 
Andersen and Andersen |2] slightly refine this approach. The idea is to use 
a hash function such that rows in different baskets are not parallel. Possible 
hash functions are the number of non-zeros, the index of the first and/or 
the last index of the row, the coefficient of the first non-zero entry, etc. In 
practice, the baskets are rather small so that rows inside one basket can be 
checked pairwise. 
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Duality Fixing. Suppose there is some column j with Cj > 0 that satisfies 
Uij > 0 if Si = and atj = 0 if Si = ‘=’ for i G M. If Ij > — oo, we 
can fix column j to its lower bound. If Ij = —oo the problem is unbounded 
or infeasible. The same arguments apply to some column j with Cj < 0. 
Suppose Qij < 0 if Si = Qij = 0 if Si = ‘=’ for i G M. If Uj < oo, we 
can fix column j to its upper bound. If Uj = oo the problem is unbounded 
or infeasible. 

Singleton Rows. If there is some row i that contains just one non-zero entry 
Qij yf 0, for some j € N UC say, then we can update the bound of column j 
in the following way. Initially let —Ij = Uj = oo and set 



Uj — bi / Uij 




Uij > 0 or 

i > 



Ij — hi ! Uij 




Uij < 0 or 

I ! 



If Uj < max{lj,lj} or Ij > imn{uj,Uj} the problem is infeasible. Otherwise, 
we update the bounds by setting Ij = ma,x{lj,lj} and Uj = min{uj,Uj} and 
remove row i. In case variable Xj is integer (binary) we round down Uj to 
the next integer and Ij up to the next integer. If the new bounds coincide 
we can also delete column j after updating the right-hand side accordingly. 

Singleton Columns. Suppose there is some column j with just one non-zero 
entry Uij yf 0, for some i £ M say. Let Xj be a continuous variable with 
no upper and lower bounds. If = ‘<’ we know after duality fixing has 
been applied that either Cj < 0 and Uij > 0 or > 0 and Uij < 0. In both 
cases, there is an optimal solution satisfying row i with equality. Thus we 
can assume that Si = ‘=’. Now, we delete column j and row i. After solving 
the reduced problem we assign to variable Xj the value 



Xi = 



bi-J2 






Uik^k 



Forcing and Dominated Rows. Here, we exploit the bounds on the variables 
to detect so-called forcing and dominated rows. Consider some row i and let 



Li — Uijlj + UijUj 

j&Pi j&Ni 

Ui = UijUj + O-ijlj 

j&Pi J&Ni 



( 3 ) 



where Pi = {j : Uij > 0} and Ni = {j : Uij < 0}. Obviously, Li < 
X)y=i 3^^® following cases might come up: 
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1. Infeasible row: 

a) Si = ‘=’, and Li > bi or Ui < bi 

b) Si = and Li > bi 

In these cases the problem is infeasible. 

2. Forcing row: 

a) Si = ‘=’, and Li = bi or Ui = bi 

b) Si = and Li = bi 

Here, all variables in Pi can be fixed to its lower (upper) bound and all 
variables in Ni to its upper (lower) bound when Li = bi {Ui = bi). Row 
i can be deleted afterwards. 

3. Redundant row: 

a) Si = and Ui < bi. 

This row bound analysis can also be used to strengthen the lower and upper 
bounds of the variables. Compute for each variable Xj 

{ {bi Li) I ttij Ij.^ if Qij ^ 0 

{bi — Ui)jaij + Ij, if aij < 0 and Si = ‘=’ 

{Li — Ui)/aij + Ij, if Qij < 0 and Si = ‘<’ 

{ {bi — Ui)/aij + Uj, if Qij > 0 and Si = ‘=’ 

{Li — Ui)/aij + Uj, if Qij > 0 and Si = ‘<’ 

{bi Li^ ! Uij -\- Uj, if CLij 0. 



Let Uj = mini tty and Ij = maxilij. If Uj < Uj and Ij > Ij, we speak 
of an implied free variable. The simplex method might benefit from not 
updating the bounds but treating variable Xj as a free variable (note, setting 
the bounds of j to — oo and +oo will not change the feasible region). Free 
variables will always be in the basis and are thus useful in finding a starting 
basis. For mixed integer programs however, it is better in general to update 
the bounds by setting Uj = min{uj,Mj} and Ij = max{lj,lj}, because the 
search region of the variable within an enumeration scheme is reduced. In 
case Xj is an integer (or binary) variable we round Uj down to the next 
integer and Ij up to the next integer. As an example consider the following 
inequality (taken from mod015 from the Miplib): 

—45x6 ~ 45x30 ~ 79x54 — 53x78 ~ 53xio2 — 670xi26 < —443 

Since all variables are binary, Li = —945 and Ui = 0. For j = 126 we obtain 
lij = (—443 + 945)/ — 670 + 1 = 0.26. After rounding up it follows that Xi 26 
must be one. 

Note that with these new lower and upper bounds on the variables it might 
pay to recompute the row bounds Li and Ui, which again might result in 
tighter bounds on the variables. 

Coefficient Reduction. The row bounds in m can also be used to reduce 
coefficients of binary variables. Consider some row i with Si = ‘<’ and let 
Xj be a binary variable with aij yf 0. 
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If 



^ij ^ O7 Ui “t“ dij <i set dj^j — Ui^ 

Oy >0,Ui- dij < h, set I _^T _ 

^ Oi — Ui Q^ij , 



( 4 ) 



where a'^- denotes the new reduced coefficient. Consider the following in- 
equality of example p0033 from the Miplib: 



-230xio - 200x16 - 400x17 < -5 



All variables are binary, Ui = 0, and Li = —830. We have Ui + a^po = 
—230 < —5 and we can reduce di,io to bi — Ui = —5. The same can be done 
for the other coefficients, and we obtain the inequality 



— 5xio — 5 xi6 — 5xi7 < —5 



Note that the operation of reducing coefficients to the value of the right-hand 
side can also be applied to integer variables if all variables in this row have 
negative coefficients and lower bound zero. In addition, we may compute the 
greatest common divisor of the coefficients and divide all coefficients and the 
right-hand side by this value. In case all involved variables are integer (or 
binary) the right-hand side can be rounded down to the next integer. In our 
example, the greatest common divisor is 5, and dividing by that number we 
obtain the set covering inequality 



-Xio - X16 - Xi 7 < - 1 . 

Aggregation. In mixed integer programs very often equations of the form 

n^jXj -f dif^xj^ — bi 

appear for some i G M, k,j G NUC. In this case, we may replace one of the 
variables, Xk say, by 



bj ajjXj 

In case Xk is binary or integer, the substitution is only possible, if the term 
(0 is guaranteed to be binary or integer as well. If this is true or x^ is a 
continuous variable, we aggregate the two variables. The new bounds of vari- 
able Xj are Ij = max{Zj, {bi — diUk) / ^ij} and Uj = min{uj, {bi — dikUk) / dij} 
if dik/dij < 0, and Ij = max{lj, {U — dikUk)/dij} and Uj = min{zij,(6i — 

dik^k') / dij} if dik I dij > 0. 

Of course, aggregation can also be applied to equations whose support is 
greater than two. However, this might cause additional fill in the matrix. 
Hence, aggregation is usually restricted to constraints and columns with 
small support. 

Disaggregation. Disaggregation of columns is to our knowledge not an issue 
in preprocessing of mixed integer programs, since this usually blows up the 
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solution space. It is however applied in interior point algorithms for linear 
programs, because dense columns result in dense blocks in the Cholesky 
decomposition and are thus to be avoided |2^. 

On the other hand, disaggregation of rows is an important issue for mixed 
integer programs. Consider the following inequality (taken from the Miplib- 
problem p0282) 

3^85 + 2^90 + a;95 + Xioo + 2;217 + 2^222 + 2^227 + 2^232 ~ 8X246 ^ 0 (6) 

where all variables involved are binary. The inequality says that whenever 
one of the variables Xi with i € S := {85,90,95,100,217,222,227,232} is 
one, X 246 must also be one. This fact can also be expressed by replacing m 
by the following eight inequalities: 



Concerning the LP-relaxation, this formulation is tighter. Whenever any 
variable in S is one, X 246 is forced to one as well, which is not guaranteed 
in the original formulation. On the other hand, one constraint is replaced 
by many (in our case 8) inequalities, which might blow up the constraint 
matrix. However within a cutting plane procedure this problem is not really 
an issue, because the inequalities in o can be generated on demand. 

Probing. Probing is sometimes used in general mixed integer programming 
codes, see, for instance, |65J . The idea is to set some binary variable tem- 
porarily to zero or one and try to deduce further fixings from that. These 
implications can be expressed in inequalities as follows: 



As an example, suppose we set in ([6|) variable X 246 temporary to zero. This 
implies that Xi = 0 for all i £ S. Applying ([3) we deduce the inequality 



for alH G S' which is exactly ([T]). 

In general, all these tests are iteratively applied until all of them fail. In 
other words, the original formulation is strengthened as far as possible. Our 
computational experiences m show that presolve reduces the problem sizes in 
terms of number of rows, columns, and non-zeros by around 10%. The time 
spent in presolve is neglectable (below one per mill). Interesting to note is also 
that for some problems presolve is indispensable for their solution. For example, 
problem fixnetb from the Miplib is an instance, where most solvers fail without 
preprocessing, but with presolve the instance turns out to be very easy. 



Xi — X 246 < 0 for all i € S. 



( 7 ) 




( 8 ) 




Xi < 0 -I- (1 - 0 )x246 = 2:246 
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4 Branch-and-Bound Strategies 

In the general outline of a branch-and-cut algorithm, see Algorithm [T| in Section 
1^ there are two steps in the branch-and-bound part that leave some choices. In 
Step (3) of Algorithm [T] we have to select the next problem (node) from the list 
of unsolved problems to work on next, and in Step (9) we must decide on how to 
split the problem into subproblems. Popular strategies are to branch on a vari- 
able that is closest to 0.5 and to choose a node with the worst dual bound. In this 
section we briefly discuss some more alternatives. We will see that they some- 
times outperform the mentioned standard strategy. For a comprehensive study 
of branch-and-bound strategies we refer to [44145] and the references therein. We 
assume in this section that a general mixed integer program of the form ([2l) is 
given. 



4.1 Node Selection 

In the following we discuss three different strategies to select the node to be 
processed next, see Step (3) of Algorithm d] 

1. Best First Search (bfs). 

Here, a node is chosen with the worst dual bound, i.e., a node with lowest 
lower bound, since we are minimizing in The goal is to improve the dual 
bound. However, if this fails early in the solution process, the branch-and- 
bound tree tends to grow considerably resulting in large memory require- 
ments. 

2. Depth First Search (dfs). 

This rule chooses the node that is “deepest” in the branch-and-bound tree, 
i.e., whose path to the root is longest. The advantages are that the tree 
tends to stay small, since always one of the two sons are processed next, 
if the node could not be fathomed. This fact also implies that the linear 
programs from one node to the next are very similar, usually the difference 
is just the change of one variable bound and thus the reoptimization goes 
fast. The main disadvantage is that the dual bound basically stays untouched 
during the solution process resulting in bad solution guarantees. 

3. Best Projection. 

When selecting a node the most important question is, where are the good 
(optimal) solutions hidden in the branch-and-bound tree? In other words, is 
it possible to guess at some node whether it contains a better solution? Of 
course, this is not possible in general. But, there are some rules that evaluate 
the nodes according to the potential of having a better solution. One such 
rule is best projection. The earliest reference we found for this rule is a paper 
of Mitra m who gives the credit to J. Hirst. Let z{p) be the dual bound of 
some node p, z(root) the dual bound of the root node, Zip the value of the 
current best primal solution, and s{p) the sum of the infeasibilities at node 
p, i.e., s{p) = ~ where x is the optimal LP 
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solution of node p and N the set of all integer variables. Let 

q{p) = z{p) + _ 

s(root) 

The term viewed as a measure for the change in the ob- 

jective function per unit decrease in infeasibility. The best projection rule 
selects the node that minimizes g{-). 

The computational tests in m show that dfs finds by far the maximal num- 
ber of feasible solutions. This indicates that feasible solutions tend to lie deep in 
the branch-and-bound tree. In addition, the number of simplex iterations per LP 
is on average much smaller (around one half) for dfs than using bfs or best pro- 
jection. This confirms our statement that reoptimizing a linear program is fast 
when just one variable bound is changed. However, dfs forgets to work on the 
dual bound. For many more difficult problems the dual bound is not improved 
resulting in very bad solution guarantees compared to the other two strategies. 
Best projection and bfs are doing better in this respect. There is no clear winner 
between the two, sometimes best projection outperforms &/s, but on average bfs is 
the best. Linderoth and Savelsbergh compare further node selection strate- 
gies and come to a similar conclusion that there is no clear winner and that a 
sophisticated MIP solver should allow many different options for node selection. 

4.2 Variable Selection 

In this section we discuss rules on how to split a problem into subproblems, if 
it could not be fathomed in the branch-and-bound tree, see Step (9) of Algo- 
rithm [TJ The only way to split a problem within an LP based branch-and-bound 
algorithm is to branch on linear inequalities in order to keep the property of 
having an LP relaxation at hand. The easiest and most common inequalities are 
trivial inequalities, i. e., inequalities that split the feasible interval of a singleton 
variable. To be more precise, if j is some variable with a fractional value Xj 
in the current optimal LP solution, we obtain two subproblems, one by adding 
the trivial inequality Xj < [xj\ (called the left subproblem or left son) and one 
by adding the trivial inequality Xj > \xf\ (called the right subproblem or right 
son). This rule of branching on trivial inequalities is also called branching on 
variables, because it actually does not require to add an inequality, but only to 
change the bounds of variable j. Branching on more complicated inequalities or 
even splitting the problem into more than two subproblems are rarely incorpo- 
rated into general solvers, but turn out to be effective in special cases, see, for 
instance, [1 Sfl 8|5‘2] . In the following we present three variable selection rules. 

1. Most Infeasibility. 

This rule chooses the variable that is closest to 0.5. The heuristic reason 
behind this choice is that this is a variable where the least tendency can be 
recognized to which “side” (up or down) the variable should be rounded. 
The hope is that a decision on this variable has the greatest impact on the 
LP relaxation. 
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2. Pseudo-costs. 

This is a more sophisticated rule in the sense that it keeps a history of the 
success of the variables on which one has already branched. To introduce this 
rule, which goes back to |8], we need some notation. Let V denote the set 
of all problems (nodes) except the root node that have already been solved 
in the solution process. Initially, this set is empty. denotes the set of all 
right sons, and V~ the set of all left sons, where V — U V~ . For some 
problem p S P let 

f{p) be the father of problem p. 

v{p) be the variable that has been branched on to obtain problem p 
from the father /(p). 

x(p) be the optimal solution of the final linear program at node p. 
z{p) be the optimal objective function value of the final linear pro- 
gram at node p. 

The up pseudo-cost of variable j € N is 






1 z{p) - zjfjp)) 



where C P+. The down pseudo-cost of variable j G N is 



^ (j) 



1 z{p) - zjfjp)) 

\Pj~\ Xy(p){f{p)) - [Xv(p){f{p))\ ’ 

pey 



( 10 ) 



( 11 ) 



where C 



V and 



re- 



^{p)-^{f{p)) ^(p)-^(/(p)) 

spectively, measure the change in the objective function per unit decrease of 



infeasibility of variable j. There are many suggestions made on how to choose 
the sets P^ and P~ , for a survey see [4^. To name one possibility, following 
the suggestion of Eckstein m one could choose P^ := {p G : v{p) = j} 



and Pj •— {p G V : u(p) = j}, if j has already been considered as a 
branching variable, otherwise set P^ := P^ and P~ := P~ . It remains to 
discuss how to weight the up and down pseudo-costs against each other to 
obtain the final pseudo-costs according to which the branching variable is 
selected. Here one typically sets 



^(j) = -k aj <P (j), (12) 

where are positive scalars. A variable that maximizes dni is chosen 

to be the next branching variable. As formula (I12|l shows, the rule takes the 
previously obtained success of the variables into account when deciding on 
the next branching variable. The weakness of this approach is that at the 
very beginning there is no information available, and <?(•) is almost identical 
for all variables. Thus, at the beginning where the branching decisions are 
usually the most critical the pseudo-costs take no effect. This drawback is 
tried to overcome in the following rule. 
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3. Strong Branching. 

The idea of strong branching, invented by CPLEX |38] (see also E), is before 
actually branching on some variable to test whether it indeed gives some 
progress. This testing is done by fixing the variable temporarily to its up and 
down value, i.e., to \xf\ and \xj\ if Xj is the fractional LP value of variable 
j, performing a certain fixed number of dual simplex iterations for each of 
the two settings, and measuring the progress in the objective function value. 
The testing is done, of course, not only for one variable but for a certain set 
of variables. Thus, the parameters of strong branching to be specified are the 
size of the candidate set, the maximum number of dual simplex iterations to 
be performed on each candidate variable, and a criterion according to which 
the candidate set is selected. Needless to say that each MIP solver has its 
own parameter settings, all are of heuristic nature and that their justification 
are based only on experimental results. 

The computational experiences in | 49| show that branching on a most in- 
feasible variable is by far the worst, measured in CPU time, in solution quality 
as well as in the number of branch-and-bound nodes. Using pseudo-costs gives 
much better results. The power of pseudo-costs becomes in particular apparent if 
the number of solved branch-and-bound nodes is large. In this case the function 
^(•) properly represents the variables that are qualified for branching. In addi- 
tion, the time necessary to compute the pseudo-costs is basically for free. The 
statistics change when looking at strong branching. Strong branching is much 
more expensive than the other two strategies. This comes as no surprise, since 
in general the average number of dual simplex iterations per linear program is 
very small (for the Miplib, for instance, below 10 on average). Thus, the testing 
of a certain number of variables (even if it is small) in strong branching is rel- 
atively expensive. On the other hand, the number of branch-and-bound nodes 
is much smaller (around one half) compared to the pseudo-costs strategy. This 
decrease, however, does not completely compensate the higher running times for 
selecting the variables in general. Thus, strong branching is normally not used as 
a default strategy, but can be a good choice for some hard instances. A similar 
report is given in |45j , where Linderoth and Savelsbergh conclude that there is no 
branching rule that clearly dominates the others, though pseudo-cost strategies 
are essential to solve many instances. 

5 Cutting Planes 

In this section we discuss cutting planes known from the literature that are in- 
corporated in general MIP solvers. Cutting planes for integer programs may be 
classified with regard to the question whether their derivation requires knowl- 
edge about the structure of the underlying constraint matrix. In Section 15.11 we 
describe cutting planes that do not exploit any structure. An alternative ap- 
proach to obtain cutting planes for a mixed integer program follows essentially 
the scheme to derive relaxations associated with certain substructures of the 
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underlying constraint matrix, and tries to find valid inequalities for these re- 
laxations. Crowder, Johnson and Padberg pioneered this methodology by 
interpreting each single row of the constraint matrix as a knapsack relaxation 
and strengthening the integer program by adding violated knapsack inequalities. 
This will be the topic of Section 



5.1 Cutting Planes Independent of Any Problem Structure 

Examples of families of cutting planes that do not exploit the structure of the 
constraint matrix are mixed integer Gomory cuts II25I27I28I16I63I . mixed integer 
rounding cuts [^, and lift-and-project cuts Marchand describes the 
merits of applying mixed integer rounding cuts, see also [ — > Pochet]. Lift-and- 
project cuts are investigated in I5I6I and are comprehensively discussed in the 
article by Egon Balas in this book [ — >■ Balas]. 

In this section we concentrate on Gomory’s mixed integer cuts. As a warm-up 
we start with the pure integer case. We will see that this approach (based on 
a rounding argument) fails if continuous variables are involved. In the general 
mixed integer case a disjunctive argument saves us. 



Pure Integer Programs 

Gonsider a pure integer program in the form minjc^x : Ax = b,x G Z”} with 
A, b integer. Set Pjp = convja: G Z" : Ax = b}. Let x be an optimal solution 
of the LP relaxation min{c^x : x G P} with P = {x G K." : Ax = b} and 
B C . , n} be a basis of A with xb = A^^b — A~^^ AjsiXn and xn = 0, where 
A = {!,... ,n}\iJ. 

If X is integer, we terminate with an optimal solution for minjc^a; : x G Pip}. 
Otherwise, one of the values xb must be fractional. Let i G B he some index 
with Xi ^ Z. Since every feasible integral solution x G Pip satisfies xb = Ag^b — 
Ag^AjvXjV, 

A-i6-^A-iA.,x,GZ. (13) 

jeN 

The term on the left remains integral when adding integer multiples of xj , j G N, 
or an integer to A~^b. We obtain 

f{A-^b) - ^ f{A-^A.,)x, G Z, (14) 

jeN 

where f{a) = a — [aj, for a G R. Since 0 < /(•) < 1 and a: > 0, we conclude 
that 



/(A-i6)-^/(A-iA,)x,<0, 

jGN 
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or equivalently, 



^ fiA-^b) 



(15) 



jeN 



is valid for Pjp. Moreover, it is violated by the current linear programming 
solution X, since xn = 0 and = f{xi) > 0. After subtracting 

Xi + from (fTHD we obtain 



which is, when the right-hand side is not rounded, a supporting hyperplane with 
integer left-hand side. Moreover, adding this inequality to the system Ax = b 
preserves the property that all data are integral. Thus, the slack variable that 
is to be introduced for the new inequality can be required to be integer as well 
and the whole procedure can be iterated. In fact, Gomory |28j proves that with 
a particular choice of the generating row such cuts lead to a finite algorithm, 
i.e., after adding a finite number of inequalities, an integer optimal solution is 
found. 

Later Chvatal fTMq found a distinct but closely related way of finding a 
linear description of Pip. He showed when using all supporting hyperplanes with 
integer left-hand side (an example of such an hyperplane is given in (11611 1 and 
rounding the right-hand sides yields again a polyhedron that contains Pip. In 
addition, he proved that iterating this process a finite number of times provides 



Mixed Integer Programs 

The two approaches discussed so far fail when both integer and continuous vari- 
ables are present. Chvatal’s approach fails because the right-hand side of a sup- 
porting hyperplane cannot be rounded down. Gomory’s approach fails since it 
is no longer possible to add integer multiples to continuous variables to derive 
(HD from (HD. For instance, | -I- ^x\ — 2x2 & ^ with x\ £ Z+,cc 2 £ K.+ has a 
larger solution set than | -|- ^x\ £ Z. As a consequence, we cannot guarantee 
that the coefficients of the continuous variables are non-negative and therefore 
show the validity of (HD. Nevertheless, it is possible to derive valid inequalities 
using the following disjunctive argument. 

Property 1. Let x < be a valid inequality for a polyhedron for 




(16) 



iGW 



fc = 1, 2. Then, 



n 



TaiTi{a\ , al)xi < max(a^,a^) 



is valid for both P^ U P^ and conv(P^ U P^). 
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This property applied in different ways yields valid inequalities for the mixed 
integer case. We present Gomory’s mixed integer cuts here, the other two, mixed 
integer rounding cuts and lift-and-project-cuts are both more or less also based 
on Property [T] see [ — >■ Balas] and [ — S> Pochet]. 

Consider again the situation in (I13I I. where Xi,i € B, is required to be integer. 
We use the following abbreviations aj = b = A~^b, fj = f{aj),fo = 

f{b), and iV+ = {j g iV : aj > 0} and N~ = N \ iV+. Expression (ifkl l is 
equivalent to — fo + k for some fc S Z. We distinguish two cases, 

SjGAf — /o ^ 0 Eind ^ /o ~ 1 < 0- In the first case, 

ajXj > /o 

jeN+ 

must hold. In the second case, we have — /o~li which is equivalent 

to 

jeN- 

Now we apply Property [U to the disjunction = Pip fl {a: : '^j^j^ajXj > 0} 
and P^ = Pip fl {a; : — *-*} ^nd obtain the valid inequality 

Y “ fzV Y - ■^0- 

jeN+ jeN- 

This inequality may be strengthened in the following way. Observe that the 
derivation of (EH remains unaffected when adding integer multiples to integer 
variables. By doing this we may put each integer variable either in the set N~^ 
or N~ . If a variable is in iV+, the final coefficient in (HD is aj and thus the 
best possible coefficient after adding integer multiples is fj = f{a.j). In N~ the 
final coefficient in (HZ) is — and thus is the best choice. Overall, 

we obtain the best possible coefficient by using min(/Y, )■ This yields 

Gomory’s mixed integer cut |2^ 

X fjXj + X 

3-fj<fO j-fj>fo 

j integer j integer 

X hjXj — X 

jGN+ j^N — 

j non-integer j non-integer 

Gomory m shows that an algorithm based on iteratively adding these inequal- 
ities solves min{c^a; : x € X} with X = {x G x = 6} in a finite 

number of steps provided c^x S Z for all x G X. 

Note that Gomory’s mixed integer cuts can always be applied, the separation 
problem for the optimal LP solution is easy. However, adding these inequalities 
might cause numerical difficulties, see the discussion in j^. In [7I11J it is shown 
how useful Gomory cuts are if they are incorporated in the right way. 






MPzM^ 

l-fo -^3 



T^ajXj > /o. 



( 18 ) 
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5.2 Cutting Planes Exploiting Structure 

In this section we follow a different route to derive cutting planes and analyze the 
structure of the constraint matrix. The idea is to identify some substructure of 
Ax < b and use the polyhedral knowledge about this substructure to strengthen 
the original formulation. Let Ajjxj < bj with / C M, J C N U C he such a 
subsystem of ([2|. If J = iV U C, we have that Pip Q {x G x : Aj.x < 
&/} =: Prei and any cutting plane valid for Pj-ei is also valid for Pjp. Thus the 
task is to identify some substructure where one knows (part of) the polyhedral 
structure and to find violated cutting planes for this substructure. This approach 
was initiated by Crowder, Johnson and Padberg 1201 for 0/1 integer programs, 
where each row of the constraint matrix was interpreted as a knapsack problem. 
Since this approach is still very common to many MIP solvers and is still very 
successful, we describe some of the cutting planes known for the 0/1 knapsack 
polytope that are used to strengthen general mixed integer programs. In case J 
is a proper subset of U C, a valid inequality for Pj-ei is not necessarily valid 
for Pjp. In this case we have to resort to lifting. The main idea of lifting will 
be described at the end of this section. As we will see lifting is also useful to 
strengthen valid inequalities. 



Knapsack Relaxations 

Consider the following polytope 

Pk {N, /, P) := conv{x S {0, 1}^ : ^ fjXj < F} (19) 

ieN 

with some finite set N, weights fj G Q, j G N, and capacity P G Q. Pk (N, /, P) 
is called the 0/1 knapsack polytope. We obtain a knapsack relaxation from our 
integer program |2D by taking some row i and setting fj = Oij and F = b^, where 
we assume that all involved variables are binary. Thus any valid inequality for 
Pk {N, f, F) is also valid for Pip. In the following we summarize some of the 
inequalities known for the 0/1 knapsack polytope that are also used for the 
solution of integer programs. 

A set P C TV is called a cover if its weight exceeds the capacity, i. e., if 
SiGS > P. With the cover S one can associate the cover inequality 

'^Xi<\S\-l 

ies 

that is valid for the knapsack polyhedron Pk {N, /, P). If the cover is minimal, 
i.e., if X)zgS\{s } ^ s G S, the inequality is called minimal cover 

inequality (with respect to S). In j4l57IJblT)lT| it was shown that the minimal 
cover inequality defines a facet of PK{S,f,F). 

Another well-known class of knapsack inequalities are (1, fc)-configuration 
inequalities that were introduced by Padberg |58J . A (1, k)- configuration consists 
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of a feasible set S, i. e., a set S such that additional item 

2 ; such that every subset of S of cardinality fc, together with z, forms a minimal 
cover. A (1, A:)-configuration S U {z} gives rise to the inequality 

'^Xi + (|5'| -k + l)x^ < IS*!, 

ies 

which is called a (l,k) -configuration inequality (with respect to S U {z}J. Note 
that a minimal cover S' is a (1, |S| — l)-configuration, and vice versa, a 
configuration inequality (with respect to S U {z}) that satisfies k = |S| is a 
minimal cover. In | 58| it was shown that the (1, fc)-configuration inequality de- 
fines a facet of Pk {S U {z}, f, F). 

Inequalities derived from both covers and (1, fc)-configurations are special 
cases of extended weight inequalities that have been introduced by Weismantel 
pnHj . Consider a subset T C N with f(T) < F and let r := F — f{T). The 
inequality 

E + E ^ (20) 

iGT i£N\T 

is called weight inequality with respect to T. It is valid for {N, f, F). The 
name weight inequality reflects that the coefficients of the items in T equal their 
original weights and the number r := F — f{T) corresponds to the residual 
capacity of the knapsack when Xi = 1 for all i G T. There is a natural way to 
extend weight inequalities by (i) replacing the original weights of the items by 
relative weights and (ii) resorting to the method of sequential lifting. 

Consider again some subset T C N with /(T) < F, let r = F — f{T) and 
denote by S the subset of TV \ F such that fi> r for all i G S. The (uniform) 
extended weight inequality associated with T and some permutation tti, . . -ttisi 
of the set S is of the form 

^Xi + ^w^Xi<\T\, ( 21 ) 

i&T i&S 

where Wi,i G S', are the lifting coefficients obtained by applying Algorithm]^ 
on page 1201 These (uniform) extended weight inequalities subsume the family of 
minimal cover and (1, fc)-configuration inequalities. They can be generalized to 
inequalities with arbitrary weights in the starting set T, see [68j . 

The separation of minimal cover inequalities is widely discussed in the lit- 
erature. The complexity of cover separation has been investigated in |2;II42l;I2) . 
whereas algorithmic and implementational issues are treated among others in 
[I2()l8dl87lti2l72j . The ideas and concepts suggested to separate cover inequali- 
ties basically carry over to extended weight inequalities. Typical features of a 
separation algorithm for cover inequalities are: fix all variables that are integer, 
find a cover (in the extended weight case some subset T), and lift the remaining 
variables sequentially. 

Cutting planes derived from knapsack relaxations can sometimes be strength- 
ened if special ordered set (SOS) inequalities Xi < 1 for some Q C N are 
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available. In connection with a knapsack inequality these constraints are also 
called generalized upper hound constraints (GUBs). It is clear that by taking the 
additional SOS constraints into account stronger cutting planes may be derived. 
This possibility has been studied in I20I41I71I54I33I . 



Lifting 



As outlined at the beginning of this section and as observed in the description 
and separation of knapsack inequalities we are often faced with the following 
problem. 

We are given an inequality that is valid for Pjp 0 {x G K" : 

Xj = 0 for all j & N\I} for some I C N. We would like to extend this inequality 
to a valid inequality of Pip and, if possible, in such a way that it induces a high 
dimensional face of Pip. Or in case already facet-defining for 

the subpolytope (as for instance the minimal cover for Px {S,f,F)), we would 
like to extend the inequality to a facet-defining inequality of Pip (in the minimal 
cover case to a facet-defining inequality of Pk {N, /j P))- One way to solve this 
problem is the method of sequential lifting, see ISIES]. The algorithm proceeds 
in an iterative fashion. It takes into account step by step a variable i G N \ I, 
computes an appropriate coefficient ai for this variable and iterates. We assume 
in the following that tti, . . . ,7r„_|/| is a permutation of the items in N \ I. 



Algorithm 2. (Sequential lifting) 

(1) For fc = 1 to n — |/| perform the following steps: 

(2) For Z = 1 to perform the following steps: 



(3) End(For) 

(4) Set 



(5) End(For) 

(6) Stop. 



7(fc, /) = max a^x^ + 0.^X1 

iel iejiri,... ,71-fc-i} 

^ ( A.iXi -j- ^ ) A.ix^ P A.j^^l P b 

iel iefvri,... ,T7fc_i} 

0 < Xi < Ui, Xi E Z for i E I U {tti, . . . , 7Tfc_i}. 



min 

1=1, 



«o -l{kG) 
I 



It can be shown by induction on k that the output — *^0 of this 

algorithm is a valid inequality for Pip. In case, for some k E {tti, . . . , 7t„_|/|}, the 
integer program in (2) is infeasible, i. e., 'y(k, 1) = — 00 , for alH = 1, . . . , Uk, we 
may assign any value to at and the inequality stays valid. In fact, the following 
result is true. 

Proposition 1. Let I C N and ^ oq an inequality that defines a 

facet of Pip fl {a; G R." : Xj = 0 for all j E N \ /}. After applying Algorithm\^ 
the inequality — *^0 defines a facet of Pip. 
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The inequality that results from applying the lifting procedure is dependent 
on the permutation of the items in the set N\I. 

Example 1. Consider the knapsack polyhedron 

Pip = conv{a: G {0, 1}® : 5xi + 5x2 + 5x3 + 5x4 + 3x5 + 8xe < 17}. 

The inequality xi + X2 + X3 + X4 < 3 is valid for Pip fl {xs = xe = 0}. Choosing 
the permutation (5, 6) yields the inequality Xi + X2 + X3 + X4 + X5 + Xe < 3. If 
one chooses the permutation (6, 5) of the items 5 and 6, the resulting inequality 
reads Xi + X2 + X3 + X4 + 2xg < 3. Both inequalities are facet-defining for Pjp. 

Note that in order to perform the lifting procedure one needs to solve a 
couple of integer programs that - at first sight - appear as difficult as the original 
problem. Sometimes they are not. For instance, if the integer program is a 0/1 
knapsack problem and the starting inequality < oq is a minimal 

cover or (1, fc)-configuration inequality, the lifting coefficients can be computed in 
polynomial time, see m- Sometimes it is possible to determine the exact lifting 
coefficient without solving integer programs, as was observed by Balas |1] for 
minimal cover inequalities and extended by Weismantel [b8| to extended weight 
inequalities. It is however true that for many general mixed integer programs 
the lifting procedure can hardly be implemented in the way we presented it, 
because computing the coefficients step by step is just too expensive. In such 
cases, one resorts to lower bounds on the coefficients that one obtains from 
heuristics. Another way is to look for conditions under which simultaneous lifting 
of variables is possible. This leads to the study of superadditive functions [TUI 
135] . 

We note that lifting can, of course, also be applied if a variable Xi is cur- 
rently at its upper bound Ui. In this case, we first “complement” variable Xi by 
replacing it by Ui — Xi, apply the same Algorithm [3] and resubstitute the vari- 
able afterwards. Lifting (sequential or simultaneous) has also been applied to 
general mixed integer programs, see, for instance, or in connection with 

lift-and-project cuts, see |7I5| and [ — >■ Balas]. 

Computational results about the success of knapsack inequalities with or 
without GUB constraints are given, for instance, in [20111119133149] . The pa- 
pers consistently show that knapsack cuts are crucial for the solution of integer 
programs that contain knapsack problems as a substructure. 

Of course, knapsack relaxations are not the only ones considered in mixed 
integer programming solvers. An analysis of other important relaxations of an 
integer program allows to incorporate odd hole and clique inequalities for the 
stable set polyhedron m or flow cover inequalities for certain mixed integer 
models [Itillfi2j . Further recent examples of this second approach are given in [T51 
I48J . More than one knapsack constraint at a time are considered in m- Cordier 
et al. HU give a nice survey on which of the mentioned cutting planes help 
to solve which problems from the Miplib. A comprehensive survey on cutting 
planes used to solve integer and mixed integer programs is given in m- 
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6 Conclusions 

In this paper we have discussed the basic features of current branch-and-cut 
algorithms for the solution of mixed integer programs. We have especially seen 
that preprocessing, though most of the ideas are straight-forward, is often very 
important to solve certain mixed integer programs. We have also observed that 
there are various alternative and better strategies for node and variable selection 
within the branch-and-bound enumeration scheme than the classical choices of 
selecting some node deepest in the tree and selecting a variable closest to one 
half. We also got to know some cutting planes that are incorporated into todays 
software. Of course, we could just touch the surface of these topics in this survey. 
The interested reader is most welcome to get deeper into the field through the 
cited literature. 
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Abstract. This is an overview of the significance and main uses of pro- 
jection and lifting in integer programming and combinatorial optimiza- 
tion. Its first four sections deal with those basic properties of projection 
that make it snch an effective and nseful bridge between problem for- 
mulations in different spaces, i.e. different sets of variables. They discuss 
topics like the integrality-preserving property of projection, the dimen- 
sion of projected polyhedra, when do facets of a polyhedron project into 
facets of its projection, and so on. They also describe the nse of projection 
for comparing the strength of different formulations of the same problem, 
and for proving the integrality of polyhedra by using extended formula- 
tions. The next section of the survey deals with disjunctive programming, 
or optimization over unions of polyhedra, whose most important incar- 
nation are mixed 0-1 programs and their partial relaxations. It discusses 
the compact representation of the convex hull of a union of polyhedra 
through extended formulation, the connection between the projection 
of the latter and the polar of the convex hull, as well as the sequen- 
tial convexification of facial disjunctive programs, among them mixed 
0-1 programs, with the related concept of disjunctive rank. Finally, the 
last two sections review the recent developments in disjunctive program- 
ming, namely lift-and-project cuts, the construction of cut generating 
linear programs, techniques for lifting and for strengthening disjunctive 
cuts, and the embedding of these cuts into a branch-and-cut framework, 
along with a brief outline of computational results. 



1 Basic Properties of Projection 

Most combinatorial optimization problems tend to have several alternative for- 
mulations, some of which are easier to handle than others. Projection provides a 
connection between different formulations. Since most combinatorial optimiza- 
tion problems are solved by some combination of polyhedral methods and enu- 
meration, and since the latter typically involves repeated solution of variants of 
the problem’s linear programming relaxation, it is crucial to be able to compare 
the strength of the LP relaxations of different formulations of the same problem. 
Given a polyhedron of the form 

Q := {(u, a:) G X : Au + Bx < 6}, 

M. Jiinger and D. Naddef (Eds.): Computat. Comb. Optimization, LNCS 2241, pp. 26-|5^ 2001. 
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where A, B and b have m rows, the projection of Q onto or onto the x-space, 
is defined as 

Proj,,(Q) := {cc G R« : G RP : (u,x) G Q}. 

Thus projecting the set Q onto R* can be viewed as finding a matrix C G R™^* 
and a vector d G R™ such that 

Proj,„((5) = {x G R* : Cx < d}. 

This task can be accomplished as follows. Call the polyhedral cone 

W := {f : = 0, n > 0} 

the projection cone associated with Proj,„((5). Then we have 

Theorem 1. 



Pi"ojx{Q) = {a; G R”^ : {vB)x < vb,v £ extr TT} 
where extrLP denotes the set of extreme rays ofW. 

Proof. LP is a pointed cone (since W C R™), hence it is the conical hull of its 
extreme rays. Therefore the system of inequalities defining Proj 2 ,(Q) holds for 
all X G extr W if and only if it hold for all x G W. 

Now let X G Proj,j((5). Then there exists u G R^ such that Au + Bx < b. 
Premultiplying this system with any v G W, x satisfies (vB)x < vb for all v G W . 

Conversely, suppose a: G R* satisfies {vB)x < vb for all v G W. Then there 
exists no > 0 satisfying vA = 0 and v{Bx — b) >0. But then, by Parkas’ 
Lemma, there exists u satisfying Au < b — Bx, i.e. x G Proj,„((5). 

□ 



Several comments are in order. 

First, if some of the inequalities defining Q are replaced with equations, the 
corresponding components of v become unrestricted in sign. In such a case, if the 
projection cone W is not pointed, then it is not the convex hull of its extreme 
rays. Nevertheless, like any polyhedral cone, W is finitely generated, and any 
finite set W of its generators can play the role previously assigned to extr W. 

Second, the set Q may be defined, besides the system of inequalities involving 
the variables u to be projected out, by any additional constraints - not necessar- 
ily linear - not involving u: such constraints will become part of the projection, 
without any change. In other words, for an arbitrary set S C R®, the projection 
of 

Q := {(m, x) G R^ X R^ : Au + Bx <b,x £ S} 



on R'J is 



Proj 2 ,(Q) := {a; G R^ : {vB)x < vb for all w G LP, x G S} 



where W is the projection cone introduced earlier. 
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Third, Proj,j,((5) as given by Theorem [T] may have redundant inequalities, 
even when the latter come only from extreme rays of IT; i.e., an extreme ray of 
W does not necessarily give rise to a facet of Proj,j.(Q)- It would be nice to be 
able to identify the extreme rays of W which give rise to facets of Proj,j.(Q), but 
unfortunately this in general is not possible: whether an inequality (vB)x < vb 
is or is not facet defining for Proj 3 .(Q) depends on B and b as well as on v. We 
will return to this question later. 

Fourth, we have the simple fact following from the definitions, that if a; G K'J 
is an extreme point of Proj,j,((5), then there exists u gMP such that (u,x) is an 
extreme point of Q. This has the following important consequence: 

Proposition 1. If Q is an integral polyhedron, then Proj,^{Q) is an integral 
polyhedron. 



Well Known Special Cases. There are two well known algorithms whose 
iterative steps are special cases of projection: 

If the matrix A in the definition of Q has a single column a, i.e., if 

Q := {(uq, x) : auo + Bx < 6}, 



then 

extr W = {u* : o/c = 0} U {v’’^ : Oi ■ Oj < 0}, 

where 

:= Cfc, yP := OiCj - OjOi, 

with Cfc the A:-th unit vector, and we have one step of Fourier Elimination (see 
e.g. m)- 

If Q is of the form 



Q:= 



|(u,a;,a:o) 



—cu — dx + xq < 0 
A'u + B'x < b' 



then 

W = {(uo, v) : —voc + vA' = 0; vq, v > 0}, 

xo + {vB' — d)x < vb' , Vu : (1, u) G extr W 1 
{vB')x < vb', Vv : (0,u) G extr IT J 

and we have the constraint set of the Benders Master Problem (see e.g. [33]b 



Proj^(Q) := (xo,a:) 



Projection and Restriction. Projection is not to be confused with restriction, 
another operation that relates a higher dimensional object to one in a subspace. 
Referring to the same set Q ^ x K'^, its restriction to the subspace defined 
by u = 0 is 

{(it, x) G Q : u = 0} = {a: G : (0, cc) G Q}. 
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Notice that, unlike in the case of projection, the restriction of Q “to or 
“to the cc-space,” is not well defined, since it may mean several things, of which 
the above is just one. Another one is, for some t G 

{( m , x) G Q : u = t} = {x G : {t, x) G Q}. 

Clearly, the restriction of Q through some value assignment to each component 
of u is always a subset of Proj 2 .(Q). 




0 1 2 
Figure la: Q 



0 1 2 
Figure lb: Proj^Q 



Fig. 1. 



0 1 2 
Figure Ic: {{x,u) G Q u = 0} 



To illustrate the difference between projection and restriction, consider the 
set Q := {(a;, u) G : a; + m > 1, 0 < a: < 2, 0 < u < 1}. Its projection on the 
x-space is {x G M : 0 < a; < 2}, whereas its restriction to the subspace of 
defined byu = Ois{xGR:l<x< 2}, as shown in Fig. [TJ 

While both projection and restriction map polyhedra from a higher dimen- 
sional space to a lower dimensional one, sometimes we want to go into the oppo- 
site direction. The operation, in a sense reverse to projection, in which one goes 
from a given polyhedral formulation of a problem to a higher dimensional one, 
involving some new variables, is called extended formulation. Sometimes going 
to an extended formulation is referred to as lifting. For instance, many problems 
defined on graphs that are usually formulated in terms of arc variables, can also 
be formulated in the higher dimensional space of arc- and node- variables; and we 
will see examples where this is advantageous. Typically, extended formulations 
are not unique, and it takes some insight to recognize those which offer some 
advantage. One possible advantage might be that the extended formulation gives 
rise to a polyhedron with desirable properties. 

The reverse of restriction of course is relaxation, i.e. the removal of the re- 
stricting constraint. But this operation becomes less trivial than it sounds when 
looked at in the following context. Suppose we know a valid inequality, ax < Uq, 
for a polyhedron PDS that is a restriction to a subspaces of a higher dimensional 
polyhedron P, and we would like to calculate the strongest (tightest) extension 
of the inequality, say ax + /3y < ao, to P. This is a well defined and important 
operation called lifting. It consists of calculating the missing coefficients (3j of the 
inequality ax + Py < a^ valid for P. There is a well known procedure for doing 
this, called sequential lifting (see e.g. m), which calculates the coefficients Pj 
one by one, and under certain conditions guarantees that if ax < ao defines a 
facet of P n S', then ax + Py < ao defines a facet of P. 
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Exercise 1. Consider the following variations of Q\ 



Qi := {(u, x) gW xW : Au + Bx = b} 

Q 2 ■= {(u, x) :W xW : Au + Bx < b, u > 0} 

Qs = Q with B = j , where I is the q x q identity matrix and 0 is 
'' ' the (to — q) X q zero matrix (assume m > q). 

For each Qi, i = 1, 2, 3 write down the projection cone associated with ProJ3,((5i). 



Exercise 2. Prove Proposition [U 



2 Dimensional Aspects of Projection 

When we have a polyhedral representation of some combinatorial object, we 
would like it to be nonredundant. This makes us interested in identifying among 
the inequalities defining the polyhedron, say Q, those which are facet inducing, 
since the latter provide a minimal representation. Further, when we project Q 
onto a subspace, we would like to know whether the facets of Q project into 
facets of the projection. To be able to answer this and similar questions, we 
need to look at the relationship between the dimension of a polyhedron and the 
dimension of its projections. This was done in the paper on which most of 
this section is based. 

Consider the polyhedron Q := {(u,a;) G Rp x R'J : Au + Bx < 6} yf 0 , 
where A, B and b have to rows, and let us partition {A, B, b) into {A^,B^, b^) 
and {A-,B-,b~), where A^u + B^x = b^ is the equality subsystem of Q, the 
set of equations corresponding to the inequalities satisfied at equality by every 
(u,x) G Q. W.l.o.g., we may assume that the equality subsystem is of full row 
rank (otherwise the redundant rows can be removed), and let r := rank(A^, B^) 
= rank(Gl“, B^ , b^), where the last equality follows from Q yf 0. 

Let dim(Q) denote the dimension of Q. It is well known that dim((3) = 
p + q — r, and that Q is full-dimensional, i.e. dim(Q) = P + qQi and only if the 
equality subsystem is vacuous. When this is the case, then dim(Proj2.((5)) = q, 
for otherwise Proj 2 .(Q) has a nonempty equality subsystem that must also be 
valid for Q, contrary to the assumption that the latter is full dimensional. 

Consider now the general situation, when Q is not necessarily full dimen- 
sional. Recall that r = rank(A^,R^), and define r* := rank(A^). Clearly, 
0 < r* < min{r, p}. It can then be shown that 

Theorem 2. [12] dim(Proj2,((5)) = dim(Q) — p + r*. 

It follows that dim(Proj3,((5)) = dim(Q) if and only if r* = p, i.e., the 
projection operation is dimension-preserving if and only if the matrix A^ is of 
full column rank. 
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When Is the Projection of a Facet a Facet of the Projection? We now 

turn to the question of what happens to the facets of Q under projection. Since 
a facet of a polyhedron is itself a polyhedron, the dimension of its projection can 
be deduced from Theorem |2] 

Let au + (3x < /3q be a valid inequality for Q, and suppose 
F := {(m, x) & Q \ au + fix = /3o} 

is a facet of Q. Let + (^) x = (^“) be the equality subsystem of F, 

and let rp '■= rank (^) ^ • Notice that rp — r = \, since by assumption 

dim(F) = dim(Q) — 1. Further, denote r*p := rank((^)~). We may interpret 
r*p — r* as the difference between the number of dimensions “lost” in projecting 
Q (which is p — r*) and in projecting F (which is p — r*p). From Theorem E] we 
then have 

Corollary 1. dim{Proj^{F)) = dim{Proj^{Q)) — 1 + (r^ — r*) 

Indeed, dim(Proj 2 ,(F)) = dim(F) — p + r = dim(Q) — 1 — p + and 
substituting for dim(Q) its value given by Theorem E] yields the Corollary. 

We are now in a position to state what happens to the facets of Q under 
projection. 

Corollary 2. Let F be a facet of Q. Then Proj^(F) is a facet of Proj^(Q) if 
and only if r*p = r* . 

Proof outline. From Corollary [1] it is clear that Proj 3 ,(F) has the dimension of a 
facet of Proj2,((5) if and only if r*p = r* . However, this by itself does not amount 
to a proof, unless we can guarantee that Proj„,(F) is a face of Proj 2 .(Q), which 
is far from obvious: in general, the projection of a face need not be a face of the 
projection. Indeed, think of a 3-dimensional pyramid, and its projection on its 
base: neither the top vertex of the pyramid, nor the 3 edges incident with it, 
become faces of the projection as a result of the operation. 

However, if Proj 3 ,(F) is a set of the form {x G Proj^(Q) : /3x = /3 q} for 
some valid inequality fix < /3q for Proj„,((5), then clearly Proj 2 ,(F) is a face of 
Proj„,((5). This is the case in our situation: since F is a facet of Q, it has a 
defining inequality au + fix < Pq. Now if r*p = r* , then there exists a vector A 
such that a = hence subtracting XA^u + XB^x = Xb^ from this defining 

inequality yields (/3 — XB^)x < Po ~ which can be written as P'x < P'q. 
Thus Proj„(F) = {x G Proj,,(Q) : P'x = /?(,}, i.e. Proj„(F) is a face of Proj,,(Q); 
hence a facet. 

□ 

Another consequence of Theorem which is not hard to prove is this: 

Corollary 3. Let r* = p, i.e. let A^ be of full column rank, and let 0 < d < 
dim(Q) — 1. Then every d-dimensional face of Q projects into a d-dimensional 
face of Proj^iQ). 



32 



E. Balas 



Projection with a Minimal System of Inequalities. As discussed in sec- 
tion 1, the inequalities of the system defining Proj^((5) are not necessarily facet 
inducing, even though they are in 1-1 correspondence with the extreme rays of 
the projection cone W. In other words, the system {vB)x < vb, Vu G extrtP, 
is not necessarily minimal. In fact, this system often contains a large number of 
redundant inequalities. It has been shown however [^, that if a certain linear 
transformation is applied to Q, which replaces the coefficient matrix B of x with 
the identity matrix I, the resulting polyhedron has a projection cone 
that is pointed, and the projection of is of the form 

Proj^(Q°) = {x G : ua: < Uo for all {v, Uq) G 

such that {v,w,vo) G extrIP°}, 

with Proj,„(Q°) = Proj 2 ,(Q). Here w is a set of auxiliary variables generated by 
the above transformation. Furthermore, we have the following 

Proposition 2. Let Proj^{Q) be full dimensional. Then the inequality vx < 
Vo defines a faeet of Proj^{Q) if and only if{v,vo) is an extreme ray of the eone 

The linear transformation that takes Q into is of low complexity (namely 
0(max{m, g}^), where B is m x q). The need to project the cone onto the 
subspace (u, t>o) arises only when B is not of full row rank. In the important case 
when B is of full row rank, we have the following stronger result: 

Corollary 4. |S] Let Proj^{Q) be full dimensional and rank(B) = m. Then the 
inequality vx < vq defines a faeet of Pro j^{Q) if and only if {v,vo) is an extreme 
ray of W° . 

In many important cases the matrix B is of the form B = (p), which voids 
the need for the transformation discussed above and leads to a projection cone 
whose extreme rays yield facet inducing inequalities for Proj 3 ,(Q). This is the 
case encountered, for instance, in the characterization of the perfectly matchable 
subgraph polytope of a graph to be discussed in section 4; as well as in the 
projection used in the convex hull characterization of a disjunctive set discussed 
in section 5. 

Exercise 3. Give an example of a 3-dimensional polyhedron Q and one of its 2- 
dimensional projections Proj(Q) such that Proj(Q) has (a) fewer facets than Q\ 
(b) more facets than Q; (c) the same number of facets as Q (visual representation 
will suffice). 

3 Comparing Different Formulations 

Combinatorial optimization problems tend to have several formulations, and 
choosing the most convenient one is often a nontrivial exercise. While the num- 
ber of variables and constraints in the different formulations do have some im- 
portance, the most relevant criterion of the comparison is the strength of the 
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linear programming relaxation. This is so because most solution procedures in- 
volve some branch and bound component, and the bounding usually comes from 
the LP relaxation: the tighter this relaxation, the stronger the bound. 

Typically the comparisons are made between two problem formulations, say 
P and Q, such that P is expressed in terms of variables x G K", while Q is 
in terms of variables (x,y) G K" x R'^. To compare the strength of the two LP 
relaxations, one has to express both P and Q in terms of the same variables, 
and so one uses projection to eliminate y and re-express Q in terms of x, i.e., as 
Proj„,(Q). One then compares the LP relaxation of this latter set to that of P. 

Before we give several examples of how this can be done, we wish to point out 
that projection can be used to compare two different formulations of the same 
problem even when they are expressed in completely different (nonoverlapping) 
sets of variables, provided the two sets are related by an affine transformation. 
Suppose, for instance, that we have two equivalent formulations of a problem 
whose feasible sets are P and Q: 

P := {x e R" : Ax < oq}, Q := {j/ S R"^ : By < 60 } 

where A is m x n, B is p x q, and 

X = Ty + c (1) 

for some T C R" x R'^ and c G R". 

Define 

Q+ := |(x,y) gR"+« 

and project Q'*’ onto R". Using the projection cone 

W := {{v, w) G R"+^ : —vT + wB — 0,w > 0}, 

we obtain 

Proj2,(Q''’) := {x gMA : vx < VC + wbo,y{v, w) G W}. 

Then to compare P with Q, we compare P with Proj 2 ,(Q''"). 

The affine transformation can also be used to derive a different analytical 
comparison between P and Q that does not make use of projection |34j . 




The Traveling Salesman Problem. Consider the constraint set of the TSP 
defined on the complete digraph G on n-|-l nodes in two well known formulations: 
Dantzig - Fulkerson - Johnson 



j = 0 , . . . ,n) = 1, i = 0,...,n 

: z = 0, .. . ,n) = 1, j = 0,...,n 



\i G S,j G S) <\S\ - 1, 
Xjj G {0, 1} , 



5C{0,... ,n}, 
z,j = 0,... ,n 



2 < l^l < 



n + 1 



2 
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Miller - Tucker - Zemlin 





II 


... ,n) 


= 1 


i = 


= 0,.. 


. ,n 




: z = 0, 


. . . , n) 


= 1 


J = 


= 0,.. 


. ,n 




Ui — Uj 


-1- nXij 


< n — 1 


bJ = 


= 1,.. 


. , n. 






Xij 


G{0,1} 


i,j = 


= 0,.. 


. ,n 



The M-T-Z formulation introduces n node variables, but replaces the expo- 
nentially large set of subtour elimination constraints by n(n— 1) new constraints 
that achieve the same goal. However, projecting the constraint set of the M-T-Z 
formulation into the subspace of the arc variables by using the cone 

W = {v\vA = Q, ?;>0}, (2) 

where A is the transpose of the node-arc incidence matrix of G, yields the in- 
equalities 

^|C| 

for every directed cycle C of G. These inequalities are strictly weaker than the 
corresponding subtour elimination inequalities of the D-F-J formulation, so the 
latter provides a tighter LP relaxation. 

The Set Covering Problem. Consider the set covering problem 

min{ca; | Ax > e, x € {0, 1}”} (SC) 

where e = (1, . . . ,1) and H is a matrix of O’s and I’s. Let M and N index the 
rows and columns of A, respectively, and 

Mj = {z S M I Oij = 1}, j G N, Ni = {j G N \ Qij = 1}, i G M. 

The following uncapacitated plant location (UPL) problem is known to be 
equivalent to (SC): 



minY^icjXj ■■ j G N) 

■ j & Ni) > 1, i G M 

— ■ * £ ^j) + ^ 0 , j G N 

Uij >0, i G M, j G N] Xj € {0, 1}, j G N 

If we now project this constraint set onto the x-space by using the cone 

W = {v\vi — Vj < 0, j G Ni, i G M : Vi > 0, i G M}, 

we obtain the inequalities 

GNs)>\S\, ySCM; 

Xj > 0 , j & N 
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(where Ns := U{Ni : i G S), each of which is dominated (strictly if S' yf M) by 
the sum of the inequalities of (SC) indexed by S. Hence the LP relaxation of the 
UPL formulation is weaker than that of (SC). 

If, on the other hand, we use the so called “strong” formulation of the unca- 
pacitated plant location problem, namely 

min V(c,a;,- : j G N) 

: J G iVO > 1 

—Uij -I- > 0 

Uij ^ 0 , Xj G {0,1}, 

then the projected inequalities include 

J2(Xj : j G N,) > 1, iG M; 

Xj >0 j G N 

thus yielding the same LP relaxation as that of (SC). 

Nonlinear 0-1 Programming. Consider the nonlinear inequality 

Oj( 7T Xi) < b, aj >0, j G N 

x^ G {0,1}, i G Qj, j gN 

where tt denotes product. 

It is a well known linearization technique due to R. Fortet |25j to replace this 
inequality by 

jeN 

~Vj + ^ Xi <\Qj\ - I j G N 

ieQj 

Uj - Xi <0 i G Qj, j G N 

Vj >0, XiG {0, 1}, i G Qj, j G N 

When the number of nonlinear terms is small relative to the number of vari- 
ables, this is perfectly satisfactory. However, often the number of terms is a 
high-degree polynomial in the number of variables, in which case it is desirable 
to eliminate the new variables yj, j G N. 

The cone needed for projecting the above constraint set onto the a;-space has 
for every M Q N an extreme direction vector 

VM = ( 1 ; wm ; 0 ), 

where 1 is a scalar, 0 is the zero vector with • J G N) components, and 

wm G K” is defined by 

if j G M 

^ \ 0 if j G N - M. 



i G M 

i G M, j G N; 
i G M, j G N; 
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The corresponding projected inequalities are 

^oi\Qj \ M C N, 

i&Qu j-ieQj jeM 

where Qm '■= ^{Qj ■ j S M). 

But this is precisely the linearization of Balas and Mazzola m, arrived at 
by other means. 

Exercise 4- Show that the extreme rays of the cone W defined in @ are the 
incidence vectors of the directed cycles of G. 

4 Proving the Integrality of Polyhedra 

One of the important uses of projection in combinatorial optimization is to prove 
the integrality of certain polyhedra. It often happens that the LP relaxation of a 
certain formulation, say P, does not satisfy any of the known sufficient conditions 
for it to have the integrality property; but there exists a higher dimensional for- 
mulation whose LP relaxation, say Q, satisfies such a condition (for the relevant 
variables). In such a case all we need to do is to show that P is the projection of 
Q onto the subspace of the relevant variables. It then follows from Proposition [I] 
that P is integral. 

We will illustrate the procedure on several examples. 

Perfectly Matchable Subgraphs of a Bipartite Graph. Let G = (V,E) be 
a bipartite graph with bipartition V = Vi U V 2 > let G{W) denote the subgraph 
induced hy W C V, and let X be the set of incidence vectors of vertex sets W 
such that G{W) has a perfect matching. 

The Perfectly Matchable Subgraph (PMS-) polytope of G is then conv(X), 
the convex hull of X. Its linear characterization [14] can be obtained by projec- 
tion as follows. 

Fact (from the Konig-Hall Theorem) 

G{W) has a perfect matching if and only if 

iwnPil = \wnV2\ 

and for every S' C W fl Pi 

1^1 < \N{S)\, 

where 

N{S) := {j G N\{i,j) G E for some i G S}. 

Theorem 3. M The PMS polytope of the bipartite graph G is defined by the 
system 

0 < < 1 i gV 

x{Vi) - x{V 2 ) = 0 
z(S) - x(A^(S)) < 0 SC Pi. 



(3) 
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If 0 < < 1 is replaced by Xi G {0, 1}, the theorem is simply a restatement 

of the Konig-Hall condition in terms of incidence vectors. So the proof of the 
theorem amounts to showing that the polytope defined by is integral. 

Note that the coefficient matrix of m is not totally unimodular. 

To prove the theorem, we restate the constraint set in terms of vertex and 
edge variables, xt and Uij, respectively, i.e., in a higher dimensional space. We 
then obtain the system 



u{i,N{i)) - Xi 


= 0 


i G Vi 




0 

II 

1 


J&V2 


x{Vi) - x{V 2 ) = 0 

Uij > 0, (*, j) G E] 0 < Xi <1, 


i G V 



whose coefficient matrix is totally unimodular. Here u{i,N(i)) := : j G 

N{i)) and := '^{uij : i G N{j)). Thus the polyhedron defined by (|4j 

is integral, and if it can be shown that its projection is the polyhedron defined by 
this is proof that the latter is also integral. This is indeed the case (see p3] 
for a proof). Moreover, the projection yields the system ([ 31 ) with the condition 
S GVi replaced by 

S CVi such that G{S U N{S)) and G{{Ki \ 5”) U {K 2 \ N{S))) are connected 

(where K is the component of G containing SUN{S), and Ki = iCflVj, i = I, 2). 

This in turn allows one to weaken the “if” requirement of the Konig-Hall 
Theorem to “if IS”] < |iV(S')| for all SCVi such that G{SUN{S)) and G((Ki \ 
S) U {K 2 \ N{S))) are connected.” 

Assignable Subgraphs of a Digraph. The well known Assignment Problem 
(AP) asks for assigning n people to n jobs. When represented on a digraph, an 
assignment, i.e. a solution to AP, consists of a collection of arcs that forms a 
node-disjoint union of cycles (a cycle decomposition). A digraph G = {V,A) is 
assignable (admits a cycle decomposition) if the assignment problem on G has 
a solution. The Assignable Subgraph Polytope of a digraph is the convex hull of 
incidence vectors of node sets W such that G{W) is assignable. 

Let deg'*'(u) and deg“(u) denote the outdegree and indegree, respectively, of 
V and for S CV, let P{S) := {j G V\{i,j) G A for some i G S'}. 

Projection can again be used to prove the following 

Theorem 4. [4] The Assignable Subgraph Polytope of the digraph G is defined 
by the system 

0<a:i<l i G V 

x{s\r{s))-x{r{s)\s) <0, scv. 



s-t Path Decomposable Subgraphs of au Acyclic Digraph. An acyclic 
digraph G = (V, A) with two distinguished nodes, s and t, is said to admit an s-t 
path deeomposition if there exists a collection of interior node disjoint s-t paths 
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that cover all the nodes of G. The s-t Path Decomposable Subgraph Polytope of 
G is then the convex hull of incidence vectors of node sets W C V — {s,t} such 
that G{W U {s,t}) admits an s-t path decomposition. 

For S CV, define 



^.f^.._Uns)\{t})ur{s) ifter(5) 



Projection can then be used to prove the following 



Theorem 5. [ 3 ] The s-t Path Decomposable Subgraph Polytope of the acyclic 
digraph G = {V,A) is defined by the system 



0<Xi<l 

x{s\r*{s))-{r*{s)\s) <o 



i G V 

SCV-{s,t} 



Perfectly Matchable Subgraphs of an Arbitrary Graph. For an arbi- 
trary undirected graph G = (V,E), the perfectly matchable subgraph (PMS-) 
polytope, defined as before, can also be characterized by projection, but this 
is a considerably more arduous task than in the case of a bipartite graph. The 
difficulty partly stems from the fact that the projection cone in this case is not 
pointed and thus instead of the extreme rays one has to work with a finite set 
of generators. On the other hand, an interesting feature of the technique used in 
this case is that a complete set of generators did not have to be found; it was suf- 
ficient to identify a set of generators that produce all facet defining inequalities 
of the PMS-polytope. 

For W C V, let G{W) be the subgraph of G induced by W, let c{W) be the 
number of components of G(VF), and let N{W) be the set of neighbors of W, 
i.e. N{W) := {j G V \ W) : {i,j) G E for some i G W}. 

Theorem 6. | 15 | The PMS polytope of an arbitrary graph G = (V, E) is defined 
by the system 



0<a;i<l i G V 

x{S)-x{N{S))<\S\-c{S) (5) 

for all S C V such that every component of G{S) consists of a single node or 
else is a nonbipartite graph with an odd number of nodes. 

For further details and a proof see 

Exercise 5. Discuss the connection between Theorem [B] and Tutte’s condition 
for a graph to have a perfect matching. 
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5 Disjunctive Programming 

Disjunctive programming is optimization over unions of polyhedra. While poly- 
hedra are convex sets, their unions of course are not. The name reflects the fact 
that the objects investigated by this theory can be viewed as the solution sets 
of systems of linear inequalities joined by the logical operations of conjunction, 
negation (taking of complement) and disjunction, where the nonconvexity is due 
to the presence of disjunctions. Pure and mixed integer programs, in particular 
pure and mixed 0-1 programs can be viewed as disjunctive programs; but the 
same is true of a host of other problems, like for instance the linear complemen- 
tarity problem. It is clear that if, for instance, a linear program over a feasible set 
F is amended with the condition that variable Xj has to be an integer between 
0 and fc, which can be written as (xj = 0) V {xj = 1) ... V {xj = k) (with “V” the 
logical “or” symbol), then it becomes an optimization problem over a union of 
polyhedra PoUPiU. . .UPfc, where Pi := {x G F ■. xj = i} for i = 0, 1, . . . ,k. The 
main application of disjunctive programming has so far been to integer and, in 
particular, 0-1 programming, where it has served as the source of a rich family of 
cutting planes and in the early 90’s has produced a computationally successful 
variant known as lift-and-project [7j. 

The foundations of disjunctive programming were laid in a July 1974 tech- 
nical report, published 24 years later as an invited paper [T] with a foreword. 
For additional work on disjunctive programming in the seventies and eighties 
see [21,311 011 (111 9l20l2fll27l2sl3sj . In particular, [2] contains a detailed account of 
the origins of the disjunctive approach and the relationship of disjunctive cuts 
to Gomory’s mixed integer cut, intersection cuts and others. Disjunctive pro- 
gramming received a new impetus in the early nineties from the work on matrix 
cones by Lovasz and Schrijver m, see also Sherali and Adams m- The ver- 
sion that led to the computational breakthroughs of the nineties is described 
in the two papers by Balas, Ceria and Cornuejols [7I8| . the first of which dis- 
cusses the cutting plane theory behind the approach, while the second deals with 
the branch-and-cut implementation and computational testing. Related recent 
developments are discussed in I9I5I17I21I22I23I30I36I40I42I431 . 

A disjunctive set, i.e. the constraint set of a disjunctive program, can be 
expressed in many different forms, of which the following two extreme ones have 
special significance. Let 

P, := {s e R" : > F}, i G Q 

be convex polyhedra, with Q a finite index set and (A*,&®) an rui x {n + 1) 
matrix, i G Q, and let P := {x G M" : Ax > b} be the polyhedron defined by 
those inequalities (if any) common to all Pi, i G Q. Then the disjunctive set 

U Pi over which we wish to optimize some linear function can be expressed as 
i&Q 



{x G M" : V (A*x > F}, 

iGQ 



( 6 ) 
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which is its disjunctive normal form (a disjunction whose terms do not contain 
further disjunctions). The same disjunctive set can also be expressed as 

{x S R" : Ax >b, V {d^x > dp), j = 1, . . . ,t}, (7) 

heQj 

which is its conjunctive normal form (a conjunction whose terms do not contain 
further conjunctions). Here (d^jdp) is a (n + l)-vector for h S Qj, all j, where 
each set Qj contains exactly one inequality of each system A~^x > b^, i G Q, and 
t is the number of all sets Qj with this property. Thus the connection between 
m and © is that each term A^x > 6® of (© contains Ax > b and exactly one 
inequality d^x > d^ of each disjunction of (|2l) indexed by Qj for j = 1, . . . ,t, 
and that all distinct systems A^x > V with this property are present among the 
terms of 

The lift-and-project approach relies mainly on the following two basic ideas 
(results) of disjunctive programming, the first one of which is derived from the 
form while the second from the form ®: 

1. There is a compact representation of the convex hull of a union of poly- 
hedra in a higher dimensional space, which in turn can be projected back into 
the original space. The first step of this operation, i.e. the higher dimensional 
representation, may be viewed as lifting (or extended formulation), while the 
second step is projection. As a result one obtains the convex hull in the original 
space. 

2. A large class of disjunctive sets, called facial, can be convexified sequen- 
tially, i.e. their convex hull can be derived by imposing the disjunctions one at 
a time, generating each time the convex hull of the current set. 

We will discuss these two ideas in turn. 

The Convex Hull of a Union of Polyhedra. 

Theorem 7. [T] Given polyhedra Pi := {x G R" : A^x > P} ^ i G Q, the 

closed convex hull of U Pi is the set of those x G R" for which there exist vectors 
ieQ 

{y\yh) G R”+\ i G Q, satisfying 

X - T,(y" :iGQ) = 0 

AV - bVo > 0 

yh>o ^ ^ 

E(2/o ■iGQ) = l. 

In particular, denoting by Pn := conv( U Pi) the closed convex hull of U Pi 

i&Q i&Q 

and by V the set of vectors (a^, {yb 2/o}ieQ) satisfying 

(i) if X* is an extreme point of Pq, then {x, {y\yg}i^Q) is an extreme point of 
V, with X = X* , {y^,y^) = (x*,l) for some k G Q, and (ybyp) = (0>0) for 
i G Q \ {k}. 

(a) */ (ai, {yb ypligg) is an extreme point of V , then y^ = x = x* and y^ = 1 
for some k G Q, (j/b^p) = (0,0), i G Q \ {k}, and x* is an extreme point of 

Pq- 
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Note that in this higher dimensional representation of Pq, the number of 
variables and constraints is linear in the number |(5| of polyhedra in the union, 
and so is the number of facets of V. Note also that in any basic solution of 
the linear system 2 /q S {0, 1}, i € Q, automatically, without imposing this 
condition explicitly. 

If we impose simultaneously all the integrality conditions of a mixed 0-1 
program with p 0-1 variables, we have a disjunction with terms, one for every 
p-component 0-1 point. But if we impose only disjunctions that yield a set Q 
of manageable size, then this representation becomes extremely useful (such an 
approach is facilitated by the sequential convexifiability of facial disjunctive sets, 
see below). 

In the special case of a disjunction of the form Xj G {0, 1}, when |Q| = 2 and 

Pjo ■= {x G K" : Ax > b, xj = 0}, 

Pji := {x G K" : Ax > b, Xj = 1}, 

Pq := conv {Pjo U Pji) is the set of those x G K" for which there exist vectors 
(p,2/o), {z,zo) G such that 



X — y — z = 0 

Ay -byo > 0 

-Vj = 0 

Az — bzo > 0 

Zj — Zq = 0 

yo + zo = i 



( 0 ) 



Unlike the general system m, the system in which \Q\ = 2, is of quite 
manageable size. 



Projection and Polarity 

In order to generate the convex hull Pq, and more generally, to obtain valid 
inequalities (cutting planes) in the space of the original variables, we project V 
onto the x-space: 

Theorem 8. [l] Proj^{V) = {x G K” : ax > /9 for all {a, (3) G Wo}, 
where 



Wo := {(a,/3) G : a = (3 < for some u® > 0, f G Q}. 



□ 

Note that Wq is not the standard projection cone of V, introduced in sec- 
tion 1, which is (assuming each A® is mi x n) 

W := {(a, P, Klieg) G R" x K x R"®® x . . . x R"®i«i : 

a - u^A^ = 0, /3 - uV < 0, M® > 0, i G Q}. 
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Instead, Wq = Proj(„_^)(VE), i.e. Wq is the projection of W onto the (a,/3)- 
space. In fact, Wq can be shown to be the reverse polar eone Pq of Pq, i.e. the 
cone of all valid inequalities for Pq. 

Theorem 9. [l] 

Pq := {(a, /?) £ : ax> (3 for all x £ Pq} 

= {(a, (3) £ : a = (3 < u'-P for some it* > 0, i £ Q}. 



□ 



To turn again to the special case of a disjunction of the form Xj £ {0,1}, 
projecting the system (0 onto the x-space yields the polyhedron Pq whose 
reverse polar cone is 



Pq = {(a, P) £ : a > uA — u^Cj 

a > vA + vgCj 
P < ub 
P <vb + Vo 

u,v >0} 



(where Cj is the j-th unit vector). 

One of the main advantages of the higher dimensional representation is that 
in projecting it onto the cc-space we have a straightforward criterion to distin- 
guish facets of Pq from other valid inequalities: 

Theorem 10. [l] Assume Pq is full dimensional. The inequality ax > P defines 
a faeet of Pq if and only if (a, /3) is an extreme ray of the eone Pq. 

Next we turn to the class of disjunctive programs that are sequentially con- 
vex! liable. 



Sequential Convexification 

A disjunctive set is called faeial if every inequality in 0 induces a face of P, 
the polyhedron defined by the inequalities Ax > b common to all terms of the 
disjunction. Zero-one programs (pure or mixed) are facial disjunctive programs, 
general integer programs are not. Sequential convexifiability is one of the basic 
properties that distinguish 0-1 programs from general integer programs. 

Theorem 11. |1] Let 



D := {x G : Ax > b, V {d^x > d^), j = 1, . . . ,t}, 

heQj 

where 3 <t <n, |Q_,j > 1 for j = 1, . . . ,t, and D is faeial. Let Pd := eonv{D) . 
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Define 



p0(= p) ■= {a; e R" : > b}, 



and for j = 1, . . . ,t, 



pi ■= conv {pi ^ n {a; : V {d^x > c^q)}). 

heQj 



Then 



P* = Pd- 

While faciality is a sufficient condition for sequential convexifiability, it is 
not necessary. A necessary and sufficient condition is given in m- The most 
important class of facial disjunctive programs are mixed 0-1 programs, and for 
that case Theorem 0 asserts that if we denote 

Pd ■= conv {a; S K" : Ax > b, xj S {0, 1}, j = I, .. . ,p}, 

:= {a; e K” : Aa: > b}, 
and define recursively for j = 1 , . . . , p 

pi := conv {pi~^ fl {a: : Xj G {0, 1}}), 



then 



P^ = Pd- 



Thus, in principle, a 0-1 program with p 0-1 variables can be solved in p 
steps. Here each step consists of imposing the 0-1 condition on one new variable 
and generating all the inequalities that define the convex hull of the set defined 
in this way. 

Note that while pure and mixed 0-1 programs are facial disjunctive programs 
and therefore are sequentially convexifiable, general integer programs - whether 
pure or mixed - are not. 

Disjunctive Rank 

A useful concept in cutting plane theory is that of rank with respect to a cut 
generating procedure. If Ax > 5 is the linear programming relaxation of some 
integer program, every valid inequality can be derived by (repeatedly) taking 
nonnegative combinations of Ax > b and rounding down the resulting inequality, 
as in [AAJa: > [A6J with A > 0. The minimum number of times this procedure 
has to be iterated in order to obtain a given valid inequality (or an inequality 
that dominates it) is known as the Chvatal rank of that inequality (see e.g. [33] 1. 
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Based on the sequential convexifiability of facial disjunctive programs, one 
can define the disjunctive rank of an inequality ax > (3 for a mixed 0-1 pro- 
gram as the smallest integer k for which there exists an ordering {ii, . . . , ip\ of 
{1, . . . ,p} such that ax > (3 is valid for . In other words, an inequality is 
of rank k if it can be obtained by k, but not by fewer than fc, applications of 
the recursive procedure defined above. Clearly, the disjunctive rank of a cutting 
plane for 0-1 programs is bounded by the number of 0-1 variables. It is known 
that the number of 0-1 variables is not a valid bound for the Chvatal rank of an 
inequality. 

The above definition of the disjunctive rank is based on using the disjunc- 
tions Xj € {0, 1}, j = 1, . . . ,p. Tighter bounds can be derived by using stronger 
disjunctions. For instance, a 0-1 program whose constraints include the gener- 
alized upper bounds J^(xj : j € Qi) = 1, i = l,...,t, with \Qi\ = \Qj\ = q, 
Qi n Qj = 0, *, j S {1, . . . , t}, and | Qi\ = p, can be solved as a disjunctive 
program with the disjunctions 

\/{xj = l), i = l,...,t{=p/q), 

jeQi 

in which case the disjunctive rank of any cut is bounded by the number t = p/q 
of GUB constraints. 

Another Derivation of the Basic Results 

The two basic ingredients of our approach, the lifting/projection technique and 
sequential convexification, can also be derived by the following procedure [7]. 
Define 

P := {x : Ax>b} C R" 



and 



Pd ■■= conv {{x e P : Xj e {0, 1}, j = 1,... ,p}), 
with the inequalities a; > 0 and < 1, j = 1, . . . ,p, included in Ax > b. 

1. Select an index j G {1, . . . ,p}. Multiply Ax > b with 1 — xj and Xj to obtain 
the nonlinear system 

(l-Xj)(Ax-6) > 0 
Xj{Ax — b) >0. 

2. Linearize ((HD by substituting yi for XiXj, i = 1, ... ,n, i ^ j, and Xj for x|. 

3. Project the resulting polyhedron onto the x-space. 

Theorem 12. [ 7 ] The outcome of steps 1, 2, 3 is 

conv (P n {x : Xj € {0, 1}}). 
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Corollary 5. [7] Repeating steps 1, 2, 3 for each j € {1, . . . ,p} in turn yields 
Pd- 

The fact that this procedure is isomorphic to the one introduced earlier can 
be seen by examining the outcome of step 2. In fact, the linearized system re- 
sulting from step 2 is precisely (5.3'), the higher dimensional representation of 
the disjunctive set defined by the constraint xj G {0, 1} (see [7j for details). 

The above 3-step procedure is a streamlined version of the matrix cone pro- 
cedure of Lovasz and Schrijver |31J . The latter involves in step 1 multiplication 
with 1 — Xj and Xj for every j G {1, . . . , p} rather than just one. While obtaining 
Pd by the Lovasz-Schrijver procedure still involves p iterations of steps 1, 2, 3, 
the added computational cost brings a reward: after each iteration, the coeffi- 
cient matrix of the linearized system must be positive semidefinite, a condition 
that can be used in various ways to derive strong bounds or cuts (see [31], [9]). 

Another similar procedure, due to Sherali and Adams m, is based on mul- 
tiplication with every product of the form ( tt Xj){ tt (1 — cc,)), where Ji and 

J 2 are disjoint subsets of {1, . . . ,p} such that | Ji U J 2 I = t for some I < t < p. 
Linearizing the resulting nonlinear system leads to a higher dimensional polyhe- 
dron whose strength (tightness) is intermediate between P and Pd, depending 
on the choice of t: for t = p, the polyhedron becomes identical to V defined by 
the system m for a 0-1 poly tope. 



Disjunctive Cuts. Theorem |9]describes the set of valid inequalities for a union 

of polyhedra Pi as defined in Theorem |3 If we replace this definition hy Pi := 

{a; G KT : A‘x > P}, then the valid inequalities for Pq = conv( U P^) are of the 

i&Q 

form ax > (3, where (a, /3) G M" x K satisfies 

a>u^A\ (3<u^b\ ieQ (10) 

for some u* > 0, * G Q. 

Another way of stating condition (llOl) is 

a,- = maxit'a! , B < minu*6* (11) 

where a* is the j-th column of A*, i G Q. 

Since TO - or, alternatively, TO - defines all valid inequalities for a dis- 
junctive program, every valid cutting plane for a disjunctive program can be 
obtained from TO by choosing suitable multipliers u*, i G Q. In other words, 
any cutting plane for a combinatorial optimization problem that can be stated 
as a disjunctive program, can be brought to the form (IIUII or (1111) . Once brought 
to this form, the cuts can be improved by appropriate changes in the multipliers 
M*, i G Q. 



46 



E. Balas 



We will illustrate this on an example. 

Example. Consider the mixed integer program whose constraint set is 

xi = 0.2 + 0.4(— 0:3) + 1.3(— X4) — 0.01(— 0:5) + 0.07(— xe) 

X2 = 0.9 — 0.3(— 0:3) + 0.4(— CC4) — 0.04(— 0:5) + 0.1(— cce) 

Xj>0, j = 1,... , 6, Xj integer , j = 1, . . . ,4. 

This problem is taken from the paper j2^, which also lists six cutting planes 
derived from the associated group problem: 

0.75a;3 + 0.875x4 + 0.0125x5 + 0.35xe > 1, 

0.778x3 + 0.444x4 + 0.40xs + O.lllxg > 1, 

0.333x3 + 0.667x4 + 0.033x5 + 0.35xe > 1, 

0.50x3 + X4 + 0.40x5 + 0.25x6 > 1, 

0.444x3 + 0.333x4 + 0.055x5 + 0.478xe > 1, 

0.394x3 + 0.636x4 + 0.346x5 + 0.155xe > 1. 

The first two of these inequalities are the mixed-integer Gomory cuts derived 
from the row of xi and X2 respectively. To show how they can be improved, we 
first derive them as they are. To do this, for a row of the form 

X{ — QiQ ^ ^ Xj'), 

with Xj integer-constrained for j € Ji := {1, . . . ,4}, continuous for j € J 2 ■= 
{5,6}, one defines fij = aij - [a^], j G JU {0}, ipio = fio, and 



( h, j & Jt = {j ^ Ji\U> fij}, 

‘fij ~ ^ fij ~ j ^ — |j € <^11/10 < fij}, 

i. ^ij , J ^ '^2 ■ 

Then every x which satisfies the above equation and the integrality con- 
straints on Xj, j € JiU |f}, also satisfies the condition 

Vi = T’jO + ^ ‘P^ji-Xj), Vi integer. 

ieJ 

For the two equations of the example, the resulting conditions are 

Hi = 0.2 - 0.6(-X3) - 0.7(-X4) - 0.01(-X5) -|- 0.07(-xe), yi integer, 

U 2 = 0.9 -I- 0.7(-X3) -I- 0.4(-X4) - 0.04(-X5) -|- 0.1(-X6), 2/2 integer. 

Since each yi is integer-constrained, they have to satisfy the disjunction yi < 
0 V > 1. Substituting (pij{—Xj) for each yi and applying the formula 
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m with multipliers Uq = 1/^io in the first term, and Uq = 1/(1 — ipio) in the 
second term of each disjunction we obtain for i = \ and f = 2 the two cuts 



0.6 



0.7 



0.01 



0.07 



— X3 + —Xa + ~^X^ + > 1, 



0.8 



0.8 



0.8 



0.2 



and 



0.7 0.4 0.04 0.1 

These are precisely the first two inequalities of the above list. Since all cuts 
discussed here are stated in the form > 1, the smaller the j-th coefficient, the 
stronger is the cut in the direction j. We would thus like to reduce the size of 
the coefficients as much as possible. 

Now suppose that instead of < 0 V j/i > 1, we use the disjunction 



{yi < 0} V 



2/1 > 1 
xi > 0 



which of course is also satisfied by every feasible x. 

Then, applying formula m with multipliers 5, 5 and 15 for ?/i < 0, j/i > 1 
and Xi > 0 respectively, we obtain the cut whose coefficients are 



that is 



_ f 5 X (-0.6) 5 X 0.6 + 15 X (-0.4) ] 
5x0.2 ’ 5 X 0.8+ 15 X (-0.2) J 

_ f 5 X (-0.7) 5 X 0.7 + 15 X (-1.3) ] 
5x0.2 ’ 5 X 0.8+ 15 X (-0.2) J 

f5x(-0.01) 5 X 0.01 + 15 X 0.01 j 
\ 5x0.2 ’ 5 X 0.8+ 15 X (-0.2) J 



f 5 X 0.07 5 X (-0.07) + 15 x (-0.07) '( 
I 5 X 0.2 ’■ 5 X 0.8+ 15 X (-0.2) ’ J 



-3, 

-3.5, 

0 . 2 , 

0.35 



—3x3 ~ 3.5x4 + 0.2x5 + 0.35x6 > 1- 



The sum of coefficients on the left hand side has been reduced from 1.9875 to 
-5.95. This strengthening has been obtained by assigning a positive multiplier 
in CB) to the inequality Xi >0, which had a zero multiplier in the previous 
derivation. 

Similarly, for the second cut, if instead of ?/2 < 0 V j /2 > 1 we use the disjunc- 
tion 



r y2 < 0 

( xi > 0 



V {j/2 > 1}, 



with multipliers 10, 40 and 10 for ?/2 < 0, xi > 0 and ?/2 > 1 respectively, we 
obtain the cut 

—7x2 — 4x4 + 0.4x5 — xe > 1. 
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Here the sum of left hand side coefficients has been reduced from 1.733 to -11.6. 
Again, the effect is obtained by assigning a positive multiplier to x\ > 0, this 
time on the other side of the disjunction. 

Exercise 6. Consider the example at the end of section 5. Why is it that to 
improve the first cut, we replace the disjunction (yi < 0) V (yi > 1) by (yi < 
0) V and to improve the second cut, we replace it by V {yi > 1)? 

Why not the other way? Give a necessary condition and/or a sufficient condition 
for an inequality > 0 (where Xi is a basic variable) to be usable in this role, 
i.e. to improve the cut from ?/i < 0 V > 1 when included in one of the terms 
of the disjunction. 



6 Generating Lift-and-Project Cuts 

The implementation of the disjunctive programming approach into a practical 
0-1 programming algorithm had to wait until the early ’90s. It required not 
only the choice of a specific version of disjunctive cuts, but also a judicious 
combination of cutting with branching, made possible in turn by the discovery 
of an efficient procedure for lifting cuts generated in a subspace (for instance, 
at a node of the search tree) to be valid in the full space (i.e. throughout the 
search tree). 



Deepest Cuts 

As mentioned earlier, if Pd is full dimensional, then facets of Pd correspond to 
extreme rays of the reverse polar cone P/,. To generate such extreme rays, for 
each 0-1 variable Xj that is fractional at the linear programming optimum, we 
solve a linear program over a normalized version of the cone P/, corresponding to 
the disjunction Xj = OV Xj = 1, with an objective function aimed at cutting off 
the linear programming optimum x by as much as possible. This “cut generating 
linear program” for the j-th variable is of the form 

min ax — (3 

s.t. a — uA + uoCj > 0 

a — vA — vgCj > 0 

— /3 + ub =0 

—j3 + vb + Vo = 0 

u,v > 0 

and (i) /3 G {1,-1}, or (ii) \(^j\ ^ 1- For details, see [7|8j . 

The normalization constraint (i) or (ii) has the purpose of turning the cone 
P/, into a polyhedron. Several other normalizations have been proposed later, 
with pro’s and con’s for each one, and their choice plays an important role in 
determining the optimum. 



(CGLP)^. 
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Solving (CGLP)j yields a cut ax > f3, where 

_ ( ma,x{uak,vak} kGN\{j} 

\ max{uaj — uq, vaj + t>o} k = j, 



( 12 ) 



with Ofc the fc-th column of A, and j3 = imn{ub,vb + uq}. This cut maximizes 
the amount [3 — ax by which x is cut off. 



Cut Lifting 

A cutting plane derived at a node of the search tree defined by a subset Pq U F\ 
of the 0-1 variables, where Tq and Fi index those variables fixed at 0 and 1, 
respectively, is valid at that node and its descendants in the tree (where the 
variables in Fq U F\ remain fixed at their values) . Such a cut can in principle 
be made valid at other nodes of the search tree, where the variables in Fq U 
F\ are no longer fixed, by calculating appropriate values for the coefficients of 
these variables - a procedure called lifting and mentioned in section 1. However, 
calculating such coefficients is in general a daunting task, which may require the 
solution of an integer program for every coefficient. One important advantage of 
the cuts discussed here is that the multipliers u, uq, v, vg obtained along with 
the cut vector (a, (3) by solving (CGLP)j- can be used to calculate by closed form 
expressions the coefficients ah of the variables /i S Fq U Fi . 

While this possibility of calculating efficiently the coefficients of variables 
absent from a given subproblem (i.e. fixed at certain values) is crucial for making 
it possible to generate cuts during a branch-and-bound process that are valid 
throughout the search tree, its significance goes well beyond this aspect. Indeed, 
most columns of A corresponding to nonbasic components of x typically play 
no role in determining the optimal solution of (GGLP)j and can therefore be 
ignored. In other words, the cuts can be generated in a subspace involving only 
a subset of the variables, and then lifted to the full space. This is the procedure 
followed in 0B!, where the subspace used is that of the variables indexed by 
some R C N such that R includes all the 0-1 variables that are fractional and 
all the continuous variables that are positive at the LP optimum. The lifting 
coefficients for the variables not in the subspace, which are all assumed w.l.o.g. 
to be at their lower bound, are then given by 

a£ := maxjMa^, vae}, h G N \ R 

where u and v are the optimal vectors obtained by solving (GGLP)j. 

These coefficients always yield a valid lifted inequality. If normalization (i) 
is used in (GGLP)j, the resulting lifted cut is exactly the same as the one that 
would have been obtained by applying (GGLP)_,- to the problem in the full space. 
If other normalizations are used, the resulting cut may differ in some coefficients 
(lifting a cut does not in general have a unique outcome). 



50 



E. Balas 



Cut Strengthening 

The cut ax > (3 derived from a disjunction of the form Xj G {0, 1} can be 
strengthened by using the integrality conditions on variables other than Xj, as 
shown in m (see also §7 of m)- Indeed, if Xk is such a variable, the coefficient 

ttfc := ma,x{uak,vak} 



can be replaced by 



a'k := mm{uak + uq \mk~\ , vak - vq [wfcj }, 



where 



vak - uak 
ruk ■■= ; . 

Uo + Vo 

For a proof of this statement, see or [H]. The strengthening “works,” 

i.e. produces an actual change in the coefficient, only if \uak — vak\ > uo + vq. 
If this inequality does not hold, then either uak > vak and 0 > mk > —1, or 
uak < vuk and 0 < ruk < 1; in either case, aj. = Furthermore, the larger the 
difference \uak — vak\, the more room there is for strengthening the coefficient 
in question. 

This strengthening procedure can also be applied to cuts derived from dis- 
junctions other than Xj S {0, 1}, including disjunctions with more than two 
terms. In the latter case, however, the closed form expression for the values 
used above has to be replaced by a procedure for calculating those values, whose 
complexity is linear in the number of terms in the disjunction (see |10| or for 
details) . 



The Overall Cut Generating Procedure 

To summarize briefly the above discussion, the actual cut generating procedure 
is not just “lift and project,” but rather RLPLS, an acronym for 

— RESTRICT the problem to a subspace defined from the LP optimum, and 
choose a disjunction; 

— LIFT the disjunctive set to describe its convex hull in a higher dimensional 
space; 

— PROJECT the polyhedron describing the convex hull onto the original (re- 
stricted) space, generating cuts; 

— LIFT the cuts into the original full space; 

— STRENGTHEN the lifted cuts. 

Exercise 7. Generalize the procedure described in the subsection “Deepest cuts” 
of section 6 to the case where you want to use a stronger disjunction than 
Xj = 0 \/ Xj = 1, namely the one implied by imposing the 0-1 condition on two 
variables rather than just one: Xi G {0, 1}, Xj G {0, 1}: 
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(a) state the resulting (4-term) disjunction, 

(b) formulate the corresponding (CGLP)jj, 

(c) given an optimal solution to (CGLP)^, state the associated cut and derive 

the expression corresponding to (IT^ . 

7 Branch-and-Cut and Computational Experience 

No cutting plane approach known at this time can solve large, hard integer pro- 
grams just by itself. Repeated cut generation tends to produce a flattening of 
the region of the polyhedron where the cuts are applied, as well as numerical 
instability which can only partly be mitigated by a tightening of the tolerance 
requirements. Therefore the successful use of cutting planes requires their com- 
bination with some enumerative scheme. One possibility is to generate cutting 
planes as long as that seems profitable, thereby creating a tighter linear program- 
ming relaxation than the one given originally, and then to solve the resulting 
problem by branch and bound. Another possibility, known as branch-and-cut, 
consists of branching and cutting intermittently; i.e., when the cut generating 
procedure “runs out of steam,” move elsewhere in the feasible region by branch- 
ing. This approach depends crucially on the ability to lift the cuts generated at 
different nodes of the search tree so as to make them valid everywhere. 

The first successful implementation of lift-and-project for mixed 0-1 program- 
ming in a branch-and-cut framework was the MIPO (Mixed Integer Program 
Optimizer) code described in [S]. The procedure it implements can be outlined 
as follows. 

Nodes of the search tree (subproblems created by branching) are stored along 
with their optimal LP bases and associated bounds. Guts that are generated are 
stored in a pool. The cut generation itself involves the RLPLS process described 
earlier. Furthermore, cuts are not generated at every node, but at every fc-th 
node, where fc is a cutting frequency parameter. 

At any given iteration, a subproblem with best (weakest) lower bound is 
retrieved from storage and its optimal LP solution x is recreated. Next, all those 
cuts in the pool that are tight for x or violated by it, are added to the constraints 
and X is updated by reoptimizing the LP. 

At this point, a choice is made between generating cuts or skipping that 
step and going instead to branching. The frequency of cutting is dictated by the 
parameter k that is problem dependent, and calculated after generating cuts at 
the root node. Its value is a function of several variables believed to characterize 
the usefulness of cutting planes for the given problem, primarily the average 
depth of the cuts obtained. In our experiments, the most frequently used value 
of k was 8. 

If cuts are to be generated, this happens according to the RLPLS scheme 
described above. First, a subspace is chosen by retaining all the O-I variables 
fractional and all the continuous variables positive at the LP optimum, and 
removing the remaining variables along with their lower and upper bounds. A 
lift and project cut is then generated from each disjunction Xj G {0, 1} for 
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j such that 0 < aij < 1 (this is called a round of cuts). Each cut is lifted and 
strengthened; and if it differs from earlier cuts sufficiently (the difference between 
cuts is measured by the angle between their normals), it is added to the pool; 
otherwise it is thrown away. After generating a round of cuts, the current LP is 
reoptimized again. 

If cuts are not to be generated (or have already been generated), a fractional 
variable is chosen for branching on a disjunction of the form Xj G {0,1}; i.e., 
two new subproblems are created, their lower bounds are calculated, and they 
are stored. The branching variable is chosen as the one with largest (in absolute 
value) cost coefficient among those whose LP optimal value is closest to 0.5. 



Computational Results in Branch-and-Cut Mode 

The above described procedure was implemented in the code MIPO, described 
in detail in [S]. This implementation of MIPO does not have its own linear 
programming routine; instead, it calls a simplex code whenever it has to solve 
or reoptimize an LP. In the experiments of |S] the LP solver used was that 
of CPLEX 2.1. The test bed consisted of 29 test problems from MIPLIB and 
other sources, ranging in size from about 30 to 9,000 0-1 variables, 0 to 500 
continuous variables, and about 20 to 2,000 constraints. The large majority of 
these problems have a real world origin; they were contributed mostly by people 
who tried, not always successfully, to solve them. Of the MIPLIB problems, 
most of those not included into the testbed were omitted as too easily solved 
by straight branch-and-bound; two problems were excluded because their LP 
relaxation exceeded the dimensioning of the experiment. MIPO was compared 
with MINTO, OSL and CPLEXMIP 2.1 (the most advanced version available at 
the time of the experiments). The outcome (see [8] for detailed results) is best 
summarized by showing the number of times a code ranked first, and second, 
both in terms of search tree nodes and in terms of computing time. This is done 
in Table 1, whose two parts correspond to Tables 9 and 10 of [H]. 



Table 1. 



Ranking by number of search tree nodes: 



OSL 


CPLEX 


MINTO 


MIPO 


First 


Second 


First 


Second 


First 


Second 


First 


Second 


15 


2 


0 


4 


4 


10 


11 


10 



Ranking by CPU time: 



OSL 


CPLEX 


MINTO 


MIPO 


First 


Second 


First 


Second 


First 


Second 


First 


Second 


2 


3 


6 


6 


10 


3 


12 


14 
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In a sense, MIPO seemed to be the most robust among the four codes: it was 
the only one that managed to solve all 29 test problems, and it ranked first or 
second in computing time on 26 out of the 29 instances. 

Other experiments with lift-and-project in an enumerative framework are 
reported on in m, where S. Thienel compares the performance of ABACUS, an 
object oriented branch-and-cut code, in two different modes of operation, one 
using lift-and-project cuts and the other using Gomory cuts; with the outcome 
that the version with lift-and-project cuts is considerably faster on all hard 
problems, where hard means requiring at least 10 minutes. 

Little experimentation has taken place so far with cuts derived from stronger 
disjunctions than the 0-1 condition on a single variable. In [2] the MIPO pro- 
cedure was run on maximum clique problems, where the higher dimensional 
formulation used to generate cuts was the one obtained by multiplying the con- 
straint set with inequalities of the form 1 — : j S S') > 0, xj > 0, j € S, 

where S is a stable set. This is the same as the higher dimensional formulation 
derived from the disjunction 

{Xj = 0, j G S) V {Xj, = 1, Xj = 0, j G S \ {ji}) V . . . 

... V {Xj^ = 1, Xj = 0, j G S \ {j4), 

where s = |S|. As this disjunction is more powerful than the standard one, the 
cuts obtained were stronger; but they were also more expensive to generate, and 
without some specialized code to solve the highly structured cut generating LP’s 
of this formulation, the trade-off between the strength of the cuts and the cost 
of generating them favored the weaker cuts from the standard disjunction. 



Results in Cut-and-Branch Mode 

Bixby, Cook, Cox and Lee m report on their computational experience with a 
parallel branch-and-bound code, run on 51 test problems after generating several 
types of cuts at the root node. One of the cut types used was disjunctive or 
lift-and-project cuts, generated essentially as in |S] with normalization (ii), but 
without restriction to a subspace and without strengthening. Since deriving these 
cuts in the full space is expensive, the routine generating them was activated 
only for 4 of the hardest problems. Their addition to the problem constraints 
reduced the integrality gap by 58.7%, 94.4%, 99.9% and 94.4%, respectively. 

In [I21J . Ceria and Pataki report computational results with a disjunctive cut 
generator used in tandem with the CPLEX branch and bound code. Namely, 
the cut generator was used to produce 2 and 5 rounds of cuts from the 0-1 
disjunctions for the 50 most promising variables fractional at the LP optimum, 
after which the resulting problem with the tightened LP relaxation was solved 
by the CPLEX 5.0 MIP code. This “cut-and-branch” procedure was tested on 
18 of the hardest MIPLIB problems and the results were compared to those 
obtained by using CPLEX 5.0 without the cut generator. The outcome of the 
comparison can be summarized as follows (see Table 2 of m for details). 
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— The total running time of the cut-and-branch procedure was less than the 
time without cuts for 14 of the 18 problems; while the opposite happened 
for the remaining 4 problems. 

— Two of the problems solved by cut-and-branch in 8 and 3 minutes respec- 
tively could not be solved by CPLEX alone in 20 hours. 

— For 6 problems the gain in time was more than 4- fold. 

Very good results were obtained on two difficult problems outside the above 
set. One of them, setlch, could not be solved by CPLEX alone, which after 
exhausting its memory limitations stopped with a solution about 15% away 
from the optimum. On the other hand, running the cutting plane generator for 
10 rounds on this problem produced a lower bound within 1.4% of the integer 
optimum, and running CPLEX on the resulting tightened formulation solved the 
problem to optimality in 28 seconds. 

The second difficult problem, seymour, was formulated by Paul Seymour in 
an attempt to find a minimal irreducible configuration in the proof of the four 
color theorem. It was not solved to optimality until very recently. The value of the 
LP relaxation is 403.84, and an integer solution of 423.00 was known. The best 
previously known lower bound of 412.76 had been obtained by running a parallel 
computer with 16 processors for about 60 hours, using about 1400 megabytes of 
memory. The cut-and-branch procedure applied to this case generated 10 rounds 
of 50 cuts in about 10.5 hours, and produced a lower bound of 413.16, using less 
than 50 megabytes of memory. Running CPLEX for another 10 hours on this 
tightened formulation then raised the bound to 414.20. More recently, using the 
same lift-and-project cuts but a more powerful computer, the problem was solved 
to optimality |35j . 
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Abstract. We study in this lecture the literature on mixed integer pro- 
gramming models and formulations for a specific problem class, namely 
deterministic production planning problems. The objective is to present 
the classical optimization approaches used, and the known models, for 
dealing with such management problems. 

We describe first production planning models in the general context of 
manufacturing planning and control systems, and explain in which sense 
most optimization solution approaches are based on the decomposition 
of the problem into single-item subproblems. 

Then we study in detail the reformulations for the core or simplest sub- 
problem in production planning, the single-item uncapacitated lot-sizing 
problem, and some of its variants. Such reformulations are either ob- 
tained by adding variables - to obtain so called extended reformulations 
- or by adding constraints to the initial formulation. This typically al- 
lows one to obtain a linear description of the convex hull of the feasible 
solutions of the subproblem. Such tight reformulations for the subprob- 
lems play an important role in solving the original planning problem to 
optimality. 

We then review two important classes of extensions for the production 
planning models, capacitated models and multi-stage or multi-level mod- 
els. For each, we describe the classical modeling approaches used. 
Finally, we conclude by giving our personal view on some new directions 
to be investigated in modeling production planning problems. These in- 
clude better models for capacity utilization and setup times, new models 
to represent the product structure - or recipes - in process industries, 
and the study of continuous time planning and scheduling models as 
opposed to the discrete time models studied in this review. 



1 Introduction 

Two chapters in this book present the general theory of mixed integer pro- 
gramming [ — > Martin], and the lift-and-project approach to combinatorial op- 
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timization [ — > Balas]. They propose and study methods for improving the for- 
mulations and solving general mixed integer programs, without considering or 
exploiting the specific nature of the problems. In this chapter we take a dif- 
ferent viewpoint. We study the literature on mixed integer programming mod- 
els and formulations for a specific problem class, namely deterministic produc- 
tion planning problems. We illustrate how the specific structure of these prob- 
lems can be exploited to obtain good or better formulations and algorithms 
for solving them. Implementation issues are not considered in this chapter, 
we refer the reader to two chapters [ — > Elf/Gutwenger/Jiinger/Rinaldi], and 
[ — ^ Ladanyi/Ralphs/Trotter]. 

In manufacturing environments, production planning problems deal with de- 
cisions about the size of the production lots of the different products to manu- 
facture or to process, about the time at which those lots have to be produced, 
and sometimes about the machine or production facility where the production 
must take place. The production decisions are typically taken by looking at the 
best trade-off between financial and customer service or satisfaction objectives. 
In production planning and operations management, the financial objectives are 
usually represented by production costs - for machines, materials, manpower, 
startup costs, overhead costs, ... - and inventory costs - opportunity costs of 
the capital tied up in the stocks, insurances, . . . -. Customer service objectives 
are represented by the ability to deliver the right product, in ordered quantity, 
at the promised date and place. The final aim of such modelling approaches is to 
provide tools allowing one to better plan and control the flow of materials and 
information within the firms. 

These production planning problems abound in practice and the literature 
contains many heuristic and exact optimization algorithmic approaches to solve 
them. Our objective here is to present a classification of the mixed integer pro- 
gramming (MIP) optimization models, and mathematical formulations, used for 
dealing with such management problems. 

In Section 2, we describe production planning models in the general context 
of manufacturing planning and control systems, including Material requirements 
planning (MRP-I), Manufacturing resource planning (MRP-II) and Hierarchical 
production planning (HPP). We describe briefly the elements of production plan- 
ning models, and define a typology of the models encountered in practice. This 
part is a self contained introduction to production planning models, and to make 
the presentation more practical at this early stage, we describe example plan- 
ning models such as the uncapacitated lot-sizing model, the multi-item master 
production scheduling (MPS) model and the materials requirements planning 
basic model. 

Although practical models are multi-item, we explain in which sense most 
optimization solution approaches are based on the decomposition of the problem 
into single-item subproblems. This motivates the analysis of the formulations of 
a variety of single-item models in the next section. The goal is to obtain a 
complete linear description of the convex hull of the feasible solutions of the 
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subproblem. Such tight formulations play an important role in solving or finding 
good solutions for the original multi-item planning problem. 

In Section 3, we present known results on the MIP formulation of the basic 
or core planning subproblem, namely the single-item uncapacitated lot-sizing 
subproblem (ULS), and some of its single-item variants including startup costs, 
backlogging, constant capacity restriction, Wagner-Within costs and profit max- 
imization objective function. We illustrate on the ULS problem the classical ap- 
proach used to study the reformulations of these polynomially solvable single 
item problems. The reformulations are either obtained by adding variables - to 
obtain so called extended reformulations - or by adding constraints to the initial 
formulation. 

For the ULS problem, several extended formulations are described, including 
the shortest path formulation that can be derived directly from the polynomial 
time dynamic program solving the problem. Other classical extended reformula- 
tions such as multicommodity and facility location reformulations are presented. 

The alternative approach is to work with the initial set of variables, and derive 
valid inequalities that describe the convex hull of the incidence vectors of feasible 
solutions. By solving the separation problem associated with such inequalities, 
it is possible to develop a branch and cut algorithm for solving these problems. 
For each model studied in this section, we describe the known valid inequalities, 
indicate whether or not they describe the convex hull of solutions. 

In Sections 4 and 5, we review two important classes of extensions for the 
production planning models, capacitated models and multi-stage or multi-level 
models. 

We separate capacitated planning models into big buckets (large time pe- 
riods) and small buckets (small time periods) models. Big buckets capacitated 
models are used in order to control the capacity utilization in each period. We 
describe how to obtain classes of valid inequalities to strengthen the formulation 
of these models. Small buckets models are used when, in order to model accu- 
rately the capacity utilization, one has to control the production sequence of 
the different items - because there are variable sequence dependent change over 
costs and times -. We describe several formulations that are used to represent 
these production sequences using small buckets. 

For multi-stage models, we explain the classical stage by stage decomposition 
used in MRP-I, and describe the well known echelon stock reformulation which 
is the basis of all MIP approaches for these problems. 

In Section 6, we give our personal view on some new directions to be inves- 
tigated in modeling production planning problems. These include better models 
for capacity utilization and setup times, new models to represent the product 
structure - or recipes - in process industries, and the study of continuous time 
planning and scheduling models as opposed to the discrete time models studied 
in this review. 

We conclude with some challenges for the future of this field of research. 
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2 Production Planning 

2.1 Manufacturing Planning and Control Systems 

The production process can be defined as the transformation process of raw 
materials into end products, usually through a series of transformation steps 
producing and consuming intermediate products. These raw materials, interme- 
diate and end products can often be inventoried, allowing one to produce and 
consume them at different moments and rates in time. Each transformation step 
may require several input products and may produce one or several outputs. The 
raw materials are purchased from suppliers, and the end products are sold to 
external customers. Sometimes, intermediate products are also sold to customers 
(spare parts, . . . ). This general definition of the production as a transformation 
process is illustrated in Figure [T] where materials inventories are represented 
by triangles, transformation processes are represented by circles, and flows of 
materials through the process (i.e. inputs into or outputs from transformations 
steps) are represented by arrows. 




Fig. 1. The production process 



Production planning is defined as the planning (that is the acquisition, time 
of usage, quantity used, . . . ) of the resources required to perform these trans- 
formation steps, in order to satisfy the customers in the most efficient or eco- 
nomical way. In other words, the production decisions are typically taken by 
looking at the best trade-off between financial objectives and customer service 
or satisfaction objectives. In production planning and operations management, 
the financial objectives are usually represented by production costs - for ma- 
chines, materials, manpower, startup costs, overhead costs, ...- and inventory 
costs - opportunity costs of the capital tied up in the stocks, insurances, . . . -. 
Customer service objectives are represented by the ability to deliver the right 
product, in ordered quantity, at the promised date and place. 

According to Anthony and Salomon [62] -among others-, the produc- 
tion planning problems can be classified into strategic, tactical and operational 
planning problems. 
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Fig. 2. An MRP-II system (adapted from j76]) 



Strategic problems deal with the management of change in the production 
process and the acquisition of the resources needed to produce. This includes for 
example product-mix, plan layout as well as location, supply chain design and 
investment decisions. The objective pursued in solving these strategic problems 
is to maintain the competitive advantage and capabilities, and to sustain the 
growth rate. This requires to model the decisions over a long term horizon using 
aggregate information. 

Tactical planning problems analyse the resource utilization problems over 
a medium term planning horizon using aggregate information. This consists in 
making decisions about, for instance, materials flow, inventory, capacity utiliza- 
tion, maintenance planning. The usual objective at this stage is to improve the 
cost efficiency and the customer’s satisfaction. 

Operational Planning problems aim at planning and controling the execution 
of the production tasks. For instance, production sequencing and input /output 
analysis models fit into this category. Here the goal is to obtain an efficient and 
accurate execution of the plans, over a very short term horizon, and using very 
detailed information. 

Manufacturing planning and control systems (MPC) (see Vollman, Berry, 
Whybark Hi) are developed to cope with these complex planning environments, 
and integrate these multi-level multi-horizon planning problems into a single 
integrated management system. 

For instance. Figure |2] describes how the tactical and operational planning 
problems are integrated in classical manufacturing resources planning (MRP-II) 



62 



Y. Pochet 



systems, an example of MPC system. In these systems, medium term production 
planning (PP) consists in deciding about capacity utilization and aggregate or 
global inventory levels to meet forecasted demand over a medium term horizon 
of about one year. A medium term horizon is needed to be able to take into 
account some seasonal pattern in the demand. Master production scheduling 
(MPS) consists in planning the detailed short term production of end-products 
in order to meet forecasted demand and firm customer orders, taking into account 
the capacity utilization and global inventory levels decided at the PP stage. Here 
the time horizon is usually expressed in weeks and corresponds to the duration 
of the production cycle. Materials requirements planning (MRP-I) establishes 
the short term production plans for all components (intermediate products and 
raw materials) from the production plan of end-products decided at the MPS 
stage, and from the product structure database (Bills of Materials, BOM). Then, 
shop-floor control systems (for manufactured components) and vendor follow-up 
systems (for purchased components) control the very short term execution of the 
plans decided at the MRP-I stage. The time horizon at this last stage is usually 
of a few days. 

Other well known integrated production planning concepts and systems fit 
into this general manufacturing planning and control framework. For instance 
the MRP-II system represented in Figure Elsubsumes the original MRP-I system 
defined by Orlicky m, and follows the hierarchical production planning (HPP) 
principles defined by Hax and Meal m- 

In these systems, the production decisions are taken, one after the other, 
using a sequential (decomposition) approach. For instance, in the classical MRP- 
II systems, the production plans of the end-items are decided first at the MPS 
level, without considering the capacity or inventory available for intermediate 
products and raw materials. Then a tentative production plan for intermediate 
items is decided assuming that the production capacity is infinite (or at least 
does not impose any constraint on the production plans). Finally, the available 
capacity is taken into account, and the production plans are modified (smoothed) 
by finite loading heuristics. 

This sequential approach is suboptimal because it is not capable to integrate 
some important constraints in a global model. The complicating factors and 
constraints that would need to be integrated are: multilevel product structures 
(raw materials, intermediate products and end products), capacity constraints 
(and in particular the presence of setup times), sequencing aspects that have to 
be taken into account because they affect the capacity utilization. 

The purpose of this lecture is precisely to describe and study more global 
production planning models addressing these complicating factors. The final 
aim of such modelling approaches is to provide tools allowing one to better plan 
and control the flow of materials and information within the firms. 
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2.2 Production Planning Models 

We describe in this Subsection the main modelling elements defining production 
planning models, and we provide examples of mixed integer programming formu- 
lations corresponding to the MRP-I, MPS and integrated MPS/MRP decision 
problems. 



Modelling Elements. There are a number of modelling elements present in 
many or most production planning problems. 

— Production planning problems deal with sizing and timing decisions for pro- 
duction lots, 

— defining the availability of resources (machine hours, workforce, subcontract- 
ing, ...), 

— allocating the resources to the production lots, 

— meeting forecasted demand (in a make to stosk environment) and/or cus- 
tomer orders (in a make to order environment) 

— and maximizing performance, expressed in terms of production costs, inven- 
tory costs, and customer service level. 

— over a finite planning horizon. 



There are also some complicating modelling elements that are not present in 
all models. 

— Multiple items interacting through shared resources. 

In this case, there is no material flow between the items, but there are 
capacity restrictions linking the items and coming from the shared resources. 
A typical example of such a model is the MPS model described in Subsection 

El 

— Multiple items interacting through multi-level product structures. 

In the case of multi-level product structures, there are additional material 
flow restrictions because a product can be an input of some production stage 
and also an output of some other production stage, or it may be delivered 
from an external supplier. This creates some precedence constraints between 
the supply and the consumption of that product. These restrictions are usu- 
ally modelled through inventory balance constraints. A typical example of 
such a model is the MPS/MRP intergated model described in Subsection 

El 

— Demand backlogging. 

In this case, it is possible - but penalized because it has a negative impact 
on customer satisfaction - to deliver a customer later than required. This 
occurs for example when a factory does not have enough capacity to deliver 
all customers on time. The single item model with demand backlogging will 
be studied in Section 13.51 
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— Startup or switching capacity utilization. 

In many cases, it is necessary to model accurately the capacity utilization in 
order to obtain feasible production plans. This requires sometimes to model 
the capacity consumed when a machine starts a production batch, or when 
a machine switches from one product to another. In these cases, we obtain 
setup or startup times models, change over times models, or models with 
sequencing restrictions. These models will be described in Section |4] 

In some other cases, the model are too complex to be solved with such setup 
or startup times restrictions, and usually simpler models involving setup or 
startup costs are considered. Such models can be seen as obtained by relaxing 
the setup or startup time restrictions. Models with setup and startup costs 
will be studied respectively in Sections |3] and 13.51 



Models Classification. The classification of deterministic production planning 
models given in Table [T] is adapted from Kuik et al [3^. In each model class, we 
cite seminal or important work related to the optimization of some corresponding 
models. Some more references will be given in the text. We refer to the above 
paper for a more comprehensive classification of the related literature. 

The models are classified along three criteria: capacitated or uncapacitated, 
constant or variable demand, single or multiple item. 



Uncapacitated Lot- Sizing Model. The first example model is the uncapac- 
itated single item, single level, uncapacitated lot-sizing problem that we will 
study in detail in Section El This model is the core subproblem in production 
planning because it is the problem solved repeatedly for each item (from end 
products to raw materials) in the material requirements (MRP-I) sequential 
planning system. 

We define the index t = 1, • • • , n to represent the discrete time periods, and n 
is the final period at the end of the planning horizon. The purpose is to plan the 
production over the planning horizon (i.e. fix the lot size in each period) in order 
to satisfy demand, and to minimize the sum of production and inventory costs. 
Classically, the production costs exhibit some economies of scale that are are 
modelled through a fixed charge cost function. That is, the production cost of a 
lot in decomposed into a fixed cost independent of the lot size, and a unit cost 
incurred for each unit produced in the lot. The inventory costs are modelled 
by charging an inventory cost per unit held in inventory at the end of each 
period. Any demand in a period can be satisfied by production or inventory, 
and backlogging is not allowed. The production capacity in each period is not 
considered in the model, and therefore assumed to be infinite. 

For each period t = 1, • • • , n, the decision variables are Xt , yt and St ■ They 
represent respectively the production lot size in period t, the binary variable 
indicating whether or not there is a positive production in period t {yt = 1 if 
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Table 1. Production planning models 







Uncapacitated 
(fixed lead times) 


Capacitated 
(variable lead times) 


Stationary 
or Constant 
demand 


Single 

Item 


Inventory control 
EOQ (raw material) 
EPQ (production) 
[Harris [^, Wilson [80]] 




Stationary 
or Constant 
demand 


Multi 

Item 


Serial lot sizing 
Assembly lot sizing 
(multi-level) 

[Crowston, W., W. [T5j] 


Economic 
lot scheduling 
(single-level) 
(shared resources) 
[Elmaghraby [18J] 


Dynamic 

demand 


Single 

Item 


Uncapacitated 
lot sizing (ULS) 
[Wagner, Whit in |7H]] 


Capacitated 
lot sizing (CLS) 
[Florian, Klein 


Dynamic 

demand 


Multi 

Item 


Serial lot sizing 
Assembly lot sizing 
General lot sizing 
(multi-level) 

[Veinott [7511 
[Zangwill |55][ 
[Love HT]] 
[Crowston, W., P3|] 
[Afentakis, C., K. [T]] 
[Afentakis, G., |2j] 


Big buckets 
Small buckets 
Discrete LS 
(single-level) 

(shared resources) 
[Eppen, Martin, 1191] 
[Trigeiro, 

[Karmarkar, Schrage, [33 
[Lasdon, Terjung, [3Sj] 
[Fleischmann, [21). [22J] 
[Salomon, K., K., V.W., P]] 



Xt > 0), and the inventory at the end of period t. The data are pt, ft, ht and 
dt modelling respectively, and for each period t, the unit production cost, the 
fixed production cost, the unit inventory cost, and the demand to be satisfied. 
For simplicity we suppose that dt > 0 for all periods t. 



The natural formulation of this uncapacitated lot-sizing problem (ULS) can 
be written as follows. 



min^(ptXt -I- ftUt + htSt) 

t=i 

St-i + Xt = dt + St 

Sq = Sn — 0 

Xt < Myt 

xt,st > 0 , yt e { 0 , 1 } 





(1) 


for all t 


(2) 




(3) 


for all t 


(4) 


for all t 


(5) 
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where M is a large positive number. Constraint m expresses the demand satis- 
faction in each period, and is called the flow balance or flow conservation con- 
straint. This is because every feasible solution of ULS corresponds to a flow in 
the network shown in Figure [3l where du = ^^e total demand. Con- 

straint m says there is no initial and no final inventory. Constraint © forces the 
setup variable in period < to be 1 when there is positive production (i.e. Xt > 0) 
in period t. Constraint m imposes the nonnegativity and binary restrictions on 
the variables. The objective function defined by is simply the sum of unit 
production, fixed production and unit inventory costs. 

The set of feasible solutions to (©-© is called . 




Fig. 3. Uncapacitated lot-sizing network (n = 4) 



Master Production Scheduling Model. The next example is known as the 
multi item (single level) capacitated lot-sizing model. This model corresponds 
typically to the problem solved at the master production scheduling level in an 
MFC system. The purpose is to plan the production of a set of items, usually fin- 
ished products, over a short term horizon corresponding to the total production 
cycle of these items. For each item, the model is the same as the ULS model in 
terms of costs and demand satisfaction. In addition, the production plans of the 
different items are linked through capacity restrictions coming from the common 
resources used to produce the items. 

We define the indices i = 1, • • • , / to represent the set of items whose produc- 
tion has to be planned, k = 1, • • • , iF to represent the set of shared resources with 
limited capacity, and t = 1, • • • ,n to represent the time periods. The variables 
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a;, y, s, and the data p, /, h, d, have the same meaning for each item i as in the 
model ULS. A superscript i has been added to represent the item i for which 
they are each defined. 

The data represents the available capacity of resource k during period t. 
The data and represent the amount of capacity of resource k consumed 
respectively per unit of item i produced, and for a setup of item i. The coefficient 
is often called the setup time of item i on resource k, and represents the 
time spent to prepare the resource k just before the production of a lot of item 
i. Together with it may also be used to represent some economies of scale 
in the productivity factor of item i on resource k. 

The natural formulation of this multi item capacitated lot-sizing model, or 
basic MPS model, can be written as follows. 



™ + ftvl + hlsl} 

i t 




(6) 


sl_,+xl = d^ + sl 


for all i,t 


(7) 


X\ < Myl 


for all i,t 


(8) 




for all t,k 


(9) 


xl,sl>0, 2/je{0,l} 


for all i,t 


(10) 



where constraints ©-(0) and cni) are the same as for the ULS model, and con- 
straint expresses the capacity restriction on each resource k in each period t. 



Material Requirements Planning Model. As a last example model, we 
describe the multi-item multi-level capacitated lot-sizing model, that can be 
seen as the integration of the previous MPS model for finished products, and 
the ULS models for all intermediate products ans raw materials, into a single 
monolithic model. The purpose of this model is to optimize simultaneously the 
production and purchase of all items -from raw materials to finished products-, 
in order to satisfy for each item the external or independent demand coming from 
customers and the internal or dependent demand coming from the production 
of other items, over a short term horizon. 

The dependency between items is modelled through the definition of 
the product structure, also called the bill of materials (BOM). The product 
structures are usually classified into Series, Assembly or General structures, see 
Figure m 

The indices, variables and data are the same as before, except that, for sim- 
plicity, we also use the index j = 1, • • • , / to indentify items. For item i, we use 
the additional notation S{i) to represent the set of successor items of i, i.e. the 
items consuming directly some amount of item i when they are produced. Note 
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Series Assembly General Structure 

Fig. 4. Types of product structures in multi-level models 



that for series and assembly structures, these sets S{i) are singleton for all items 
i, and for a finished product i, we always have S{i) = 0. And for j G S{i), we 
denote by the amount of item i required to make one unit of item j. These 
values are indicated along the edges (i,j) in Figure [31 This parameter r is used 
to identify the dependent demand, whereas dj corresponds to the independent 
demand. For each item i, we denote by 7* the lead time to produce or deliver 
any lot of i. More precisely, x\ represents the size of a production or purchase 
order of item i launched in period t, and delivered in period t -I- 7*. 

The natural formulation for the general product structure capacitated multi- 
level lot-sizing model, or the monolithic MRP model, is 



™ + ftvi + 

d. t 






(11) 


+ - 


- [d\ + 




for 


all i, t 


(12) 


















x\ < Myl 


for 


all i, t 


(13) 


E 


a^>^x\ - 




for 


■ all t, k 


(14) 




s\ 


>0, ylG {0,1} 


for 


all i, t 


(15) 



where the only difference with respect to the previous MPS model resides in the 
form of the flow conservation constraint m- For each item i in each period t, the 
amount delivered from production or vendors is x^_^i ordered in period t — 7®, 
and the demand to be satisfied is the sum the independent demand d\ and 
the dependent demand J2j^s(i) implied by the production of immediate 
successors j G S{i). 



Mathematical Programming Models 



69 



Because of the multi-level structure, the presence of single item ULS models 
as submodels is less obvious, but we will show in Section ISTTl how to reformulate 
this model in the form of single item ULS models linked by capacity and product 
structure restrictions. 



2.3 Optimization Methods 

We are interested in this lecture in the optimization approaches used to solve 
production planning problems. This means that we want either to find provably 
optimal solutions, or to find near-optimal solutions with a performance guar- 
antee, expressed usually in terms of a percentage of deviation of the objective 
value from the optimal value. 

As we have seen in the examples from Section 12.21 many or most multi item 
production planning problems can be modelled as the following mixed integer 
programming (MIP) model 



MIP = + hlsl + fiyl} 

i t 


(16) 


(x®,s®,?/®) G lU® for all i 


(17) 


(x®,s®,y®)i=i,..,7 GP 


(18) 




(19) 



where the index t is used to represent time periods, the index i is used to represent 
the different items, W® represents the set of feasible solutions (i.e. lot sizes, setups 
and inventory levels) to the item i lot-sizing problem - like ULS, or some of its 
variants studied in Section fTSl -. and P represents the set of solutions satisfying 
a set of coupling linear constraints - like the capacity constraints ([9l) in Section 
12.21 among others-. 

Most optimization methods are based on easy to solve relaxations of the 
initial problem, either to prove optimality, or to provide a performance guarantee 
of some near-optimal solution. For example, the above problem can be solved by 
some standard MIP software using a branch and bound approach based on the 
following linear relaxation 



™ Yl Y^Pt^t + h\s\ + flvl} 

i t 


(20) 


(P,s^^/®) G lUifl. foralH 


(21) 


(x®,S®,2/®)i=y...,/ G P 


(22) 




(23) 



where represents the linear relaxation of the set lU® obtained by relaxing 
the constraints yl G {0, 1} into 0 < yj < 1, for all f, t. 
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Unfortunately, this direct MIP approach can only be used for the solution of 
small size problems. In order to solve or to find good solutions for more realistic 
or real size problems, one has to work with better or tighter relaxations. Because 
of the structure of the intial problem, most efficient solution approaches are based 
on the following relaxation 



LB = Ta\u'Y^'^{p\x\ + h\s\ + flyl} (24) 

i t 

G conv{W^) for alH (25) 

{x\s\f)i=i,..jGP (26) 

(27) 



where conv(W^) represents the convex hull of the solutions of TU*. 

This bound LB is never worse, and typically much tighter than the bound 
LR, and is precisely the relaxation bound exploited in many different methods. 

— Lagrangean relaxation methods and Dantzig- Wolfe or column generation 
methods give rise to schemes where the bound LB is iteratively computed by 
relaxing or eliminating the coupling constraints (1^ . and solving separately 
for each item i the resulting single item subproblems. These subproblems are 
optimization problems defined over the sets W^. See, for instance, II]. 
nn!, HZ], [ss], [szi, m for Lagrangean relaxation approaches, and m, iBH] 
for column generation approaches. See also Lemarechal [ — > Lemarechal] for 
a general presentation of Lagrangean relaxation. 

— If a compact (small size) linear description of conv{W^) is available for each 
item i, then by adding the coupling constraints one gets a formulation whose 
linear relaxation gives the bound LB. Then a direct branch and bound ap- 
proach can be used with this formulation. See, for instance, m, m and 

m- 

— If a linear but non compact description of conv(W^) is available for each 
item i, using the corresponding formulation directly to compute LB takes 
too much time. Then the classical approach is to add (some of) the con- 
straints defining conv(W^) in the course of optimization, instead of a priori, 
by solving a so-called separation problem over the set conv{W^). This gives 
branch and cut type methods. See, for instance, m, m and m- 



Therefore, in order to design optimization methods for solving complex multi 
item production planning problems, one has to design algorithms to solve the 
single item subproblem, or one has to find a compact complete linear description 
for the subproblem, or one has to find a complete linear description with an 
efficient separation algorithm for the subproblem. 
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3 The CORE Subproblem: Single Item Uncapacitated 
Lot-Sizing 

First, we study in detail in this Section the uncapacitated lot-sizing problem 
(ULS) already introduced in Section[221 Next, we will give the main optimization 
and reformulation results known for the main variants of this single item problem. 
A number of formulations and the general structure of this Section are imported 
from the more technical paper by Pochet and Wolsey [^, and from Chapter 13 
of Wolsey [83]. 



3.1 Basic Formulation and Motivation 

For the ease of reading, we have tried to separate as much as possible the previous 
modelling Section from this more technical Section. Therefore, we recall here the 
basic formulation of the ULS problem. 



n 



min 



'^{PtXt + ftVt + htSt) 
t — 1 




(28) 


St-i + Xf = dt + St 


for all t 


(29) 


So = Sn = 0 




(30) 


Xt < dtnVt 


for all t 


(31) 


Xt,St >0, yt£ {0,1} 


for all t 


(32) 



where the large upper bound on xt in constraint m has been replaced by its 
true upper bound dm equal to the sum of the demands from period t up to end 
of the horizon. In general, we define du by dui = k <1. 

The set of feasible solutions to (l^ - (l3^ is called . 

The first motivation to study this specific ULS problem is that ULS is solved 
repeatedly in MRP systems for each component, backward in the product struc- 
ture from end products to raw materials. This approach, to define the production 
plans using a sequential item by item and uncapacitated model, has important 
shortcomings. By solving the capacity problems only after the solution of ULS 
for all items, and by neglecting the multi-level product structure, the production 
plans obtained are suboptimal in terms of cost and flexibility. The extension to 
capacitated models and multi-level models are discussed in Sections |4| and |5j 
respectively. 

The other motivation to study ULS comes from the fact that it is the most 
common subproblem arising in complex multi-item production planning prob- 
lems, and we have already seen in Section 12.31 the central role of single item 
subproblems in designing decomposition or branch and cut solution approaches 
for multi item problems. 
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3.2 Dynamic Programming Algorithm 

The first natural question to ask in studying a subproblem is the following. 

Question 1: Is optimization over polynomially solvable ? 

The answer is yes, and comes from a well known decomposition property of 
optimal solutions (see, for example, Zangwill |^). This property means that 
there always exists an optimal solution where production only occurs in a period 
t when the entering stock St-i is zero. 

Theorem 1 There exists an optimal solution to problem ULS i28i )- li32i ) where 
St-iXt = 0 for all t 

Proof. Production may only occur in periods t with yt = 1. For any selection 
of such production periods, the problem reduces to a minimum cost flow prob- 
lem through the network shown in Figure |3] In any extreme solution, the arcs 
with positive flow form an acyclic subgraph and any acyclic subgraph satisfies 
St-iXt = 0 for all t. ■ 

There are a number of algorithms based on this decomposition property. The 
seminal paper of Wagner and Whitin m describes an O(n^) dynamic program- 
ming algorithm solving ULS. More efficient implementations of this algorithm 
running in 0(n log n) have been proposed by Federgrun, Tsur |20j . Wagelmans, 
van Hoesel, Kolen m and Aggarwal, Park [3j. 

We describe here a simple forward dynamic program with a trivial imple- 
mentation in O(n^). For simplicity of notation, we eliminate first the stock cost 
coefficients from the objective function using the flow balance constraints (1291) . 
That is, we replace htSt by ~ ^i,t] ™ objective (l28l) . for all t 

(Recall that di^t = ^k)- Observe also that if /* < 0 for some t in the objec- 

tive PHI) . then = 1 in all optimal solutions to {2HD-(I32I). Therefore, problem 
ULS can be solved using the following objective function. 



min '^{ctxt + ftvt) (33) 

t=i 

with ct = Pt+X]r=t “ max{/, 0}. In other words, to obtain the optimal 

solution to ULS f(l28]- (l3^ i is suffices to add f~ = min{/, 0}, to 

the optimal solution of the transformed problem dM), dMI-dSl- From now on, 
and unless otherwise mentioned, we will work with this transformed objective 
function. 

Now, let H{k) be the cost of the minimum cost solution for problem ULS 
restricted to periods 1 up to A:. From Theorem |2 there exists a minimum cost 
solution of value H{k) such that, if period t < k is the last production period 



Mathematical Programming Models 



73 



before period k, then Xt = dtk, xi = 0 for I = t + 1, - ■■ ,k, and the cost of the 
solution for periods 1 to t — 1 must be H{t — 1). This holds by the optimality 
principle, and the fact that st-i = 0 when Xt > 0. This allows us to define the 
following recursion to compute H(k). 



Starting from H{Q) = 0, and computing H{k) using fl34t for k = 
leads to the value 



of an optimal solution to ULS. It is easy to check that a direct implementation 
of this recursion leads to an O(n^) algorithm for ULS. 

3.3 Extended Reformulations 

Given the equivalence between optimization and separation (see Grotschel, Lo- 
vasz and Schrijver [2^), and given the fact that ULS is polynomially solvable 
and appears as a subproblem in many complex multi-item lotsizing problems, 
the next question to ask in order to solve these problems by branch and bound 
is now. 

Question 2: Is there a compact linear description for ULS ? 

The answer is again yes for such a simple lot-sizing problem, and we describe 
three of the popular extended reformulations for ULS. 

Facility Location. The first reformulation takes the form of a simple facility 
location model, defined for facilities and customers located on a line, and intro- 
duced by Krarup, Bilde [33] . It consists in locating facilities ( yt = 1 if a facility 
is located at location t) to serve customer demands on a one way road ( dt is the 
demand to meet at location t) and minimize installation costs {ft at location t), 
unit production costs {pt at location f) and unit transportation costs {ht from 
location t to location t -I- 1). 

If we define the variable Wst as the fraction of the demand dt served from a 
production in period s (i.e. a facility located in s), for all 1 < s < t < n, and ijs 
as the usual 0/1 setup variable for period s = 1, • • • , n, then, by analogy to the 
location model, ULS can be reformulated as 



H{k) = min {H{t - 1) -l- //" -l- Ct dtk} 



(34) 



n 




(35) 
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(FL) min E ftVt + EE CsdtWst (36) 

t — 1 S — 1 t — S 

t 



y^^wst = 1 

S — l 


for all t 


(37) 


Wst < Vs 


for all 1 < s < t < n 


(38) 


Wst,yt > 0 


for all 1 < s < t < n 


(39) 


yt<f 


for all t 


(40) 


yt integer 


for all t 


(41) 



where constraint m expresses the demand satisfaction in period t from pro- 
ductions in periods s < t, constraint (Id8ll imposes a setup in period s when there 
is production in s to satisfy some demand in period t > s, constraints (IMJ-dlil 
impose the nonnegativity and binary restrictions on the variables, the objective 
function (IMt expresses the cost of the production plan in the form (1^ . 

Formulation CT-CT is a valid reformulation for ULS, but due to the par- 
ticular structure of the objective function, we have the following stronger result 
from Krarup, Bilde |31] (see also Barany, Van Roy and Wolsey [B| for a primal- 
dual proof). 

Theorem 2 The linear programming relaxation EMP-CTo/ FL always has an 
optimal solution with y integer, and solves ULS. 

Note that the polyhedron dSD-OioJ has fractional vertices. However, as there is 
always an optimal integral solution to S-diol) , this linear formulation suffices to 
solve ULS. In order to obtain an integral polyhedron corresponding to the convex 
hull of solutions to ULS (i.e. a linear formulation with all integral vertices), one 
needs to add the constraints Wst > Ws,t+i for 1 < s < t < n to (l37t - (l40l) . 

Multicommodity. A classical way to tighten the formulation of fixed charge 
network flow problems is to decompose the flow along each arc of the network as a 
function of its destination. This defines a so-called multicommodity formulation 
by assigning a different commodity to each destination node, see Rardin and 
Choe [60] . 

We define the variable Xu (resp. sa) as the production of commodity t (resp. 
inventory of commodity t) in period i. Commodity t corresponds to the demand 
delivered in period t. By decomposing the flow by commodity, ULS can be re- 
formulated as 



(MC) 



n n n 

min E E + E 

i—l t—i i —1 

Si-i^t + Xit = Sltdt + Sit for 1 <i <t <n 
Sot = Stt = 0 for 1 < t < n 



(42) 

(43) 

(44) 
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Xrt < dtyi 


for 1 < 7 < t < n 


(45) 


Xit,yit ^ 0, yn Si 1 


for 1 < 7 < t < n 


(46) 


yn integer 


for 1 < 7 < t < n 


(47) 



where the notation Sa denotes 1 if i = t, and 0 otherwise. Constraint (l4dl) is the 
flow conservation constraint of commodity t in period i < t. Constraints (I44II 
impose there is no intial and no final inventory (end of period t) of commodity t. 
Constraint (1451) forces the setup variable yi to be 1 when there is production in 
period i, and using the decomposition of the flow the upper bound on xu is dt- 
Constraints (l46t - (l47t impose the nonnegativity and binary restrictions on the 
variables and constraint (l42l) expresses the cost of the production plan. 

Formulation dUll-dlTl) is a valid reformulation for ULS. Using the equations 
(l43ll to eliminate the inventory variables from formulation (MC), and using the 
substitution Ws,t = Xs,t/dt between formulations MC and FL, it is easy to prove 
the following result. 

Theorem 3 The linear relaxations of MC and FL are 

equivalent. Therefore, the linear programming relaxation of MC always has an 
optimal solution with y integer, and solves ULS. 

Shortest Path. The last reformulation technique that we illustrate here is due 
to Martin [33], and first applied to lot-sizing problems by Eppen and Martin 
HSj. It can be used to transform the dynamic programming algorithm for ULS 
into a pair of primal-dual linear formulations. 

The dynamic programming algorithm (l34l constructs a least cost succes- 
sion of so-called (disjoint) regeneration intervals [t,k], with t < k, covering the 
planning horizon from period 1 to period n. Each interval [t, k] represents the 
satisfaction of the demands in periods t up to fc by the production of dtk in 
period t. This is illustrated in Figure]^ with n = 7, where the entire production 
plan consists in the succession of 3 regeneration intervals: [1,2], [3,6], [7,7]. 







[1,2] [3,6] [7,7] 

Fig. 5. An extreme solution as a succession of regeneration intervals 



Using as variables H(k), for fc = 0, • • • , n, the following linear program com- 
putes the optimal value H{n) + X["=i dynamic program H34I) solving 



ULS. 
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n 

max H{n) — H{0) + E/r (48) 

H{k) < H{t — 1) + fj^ + Ct dtk for all 1 < t < fc < n (49) 

H{Q) = 0 (50) 

This claim is easy to prove by observing that the constraints IlSD are equivalent 
to H(k) < mini<t<fc{iJ(t— l)+/j^ +ct dtfe}, for all k. The maximization direction 
in the objective function (148 II suffices to guarantee that the values H(k), for all fc, 
computed in the dynamic program (18411 define an optimal solution to (I48II - (I5()II . 

Associating the dual variable (j)t^k to each constraint in (|4^ . the dual of 
(I48l l- (l50l l can be written as the minimum cost network flow problem 



n n n 



minE + ft]4>tk 


+ E'^^* 


(51) 


t=i k=t 


t=l 




^ ( 4’tn = 1 
t=l 




(52) 


O 

II 

+ 

1 


for 1 < t < n — 1 


(53) 


2=1 






-E<^i' = 
1—1 




(54) 


4>tk > 0 


for 1 < t < k < n 


(55) 



where the constraints (I52| |- (|54|) correspond to the flow conservation constraints 
of the network flow problem, with one flow constraint for each time period, and 
one unit of flow sent from the single source node (Pll to the single sink node 
( 154 ll . Therefore, the linear program (IKTTl - dKKIl defines a shortest path problem 
from source to sink, and it has 0-1 extreme points. It is easy to see that this 
formulation models the possible sequences of regeneration intervals, with (j)t^k = 1 
if the regeneration interval [t, fc] is part of the solution at cost [ctdtk + ft] - This 
is illustrated in Figure Elfor the case n = 3. 




Fig. 6. Shortest Path Formulation of ULS (n = 3) 



A final reformulation step can be used to obtain an equivalent formulation 
where the setup decisions are made explicit. 
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n n n 



min ^ ^ ctdtk(l>tk + ^ /tj/t 




(56) 


k^t 

n 

^ ^ 4^tn = 1 




(57) 


t n 

^ = 0 


for 1 < t < n — 1 


(58) 


i—1 

n 

= -1 




(59) 


n 

X! f'tk < yt 


for all t 


(60) 


k—t 

(t>tk,yt >0, yt<l 


for 1 <t < k < n 


(61) 



In order to obtain this formulation, we have first replaced the objective term 
by [fj^]yt and added the setup defining equalities YZ=t4>tk = TJt, 
for^all t. Next we have replaced ft Vt + ELi /t” by Yt=i[ft + ft\yt = 
J27=iftyt, and relaxed Yfk=tf>tk = yt into Yfk=tf^tk < yt to be able to force 
yt = 1 when /t < 0 without perturbing the flow constraints. 

The above discussion has shown the following. 

Theorem 4 The linear program SP ( h5b\) - h61\) ) always has an optimal solution 
with y integer, and solves ULS. 



Note that here the polyhedron |57l)-((6l| has no fractional vertices. To relate 
this shortest path formulation to the other formulations, one can show that 
formulation dSHi-dll]) is equivalent to the facility location formulation dlili-illol, 
augmented with Wst > Ws,t+i for 1 < s < t < n, using the substitution 4>tk = 
wtk - wt,k+i for 1 < t < fc < n. 



3.4 Complete Linear Description in the Initial Space 

We have described in Section E31 several linear formulations for the ULS problem 
with n periods, but they involve 0{n^) variables and 0{n^) or 0{n) constraints. 

Although these formulations are as tight as possible, to be able to solve large 
size multi item production planning problems - with many items and many 
periods -, we need to reduce the size of the formulation used for the single item 
subproblems. One way to achieve this is to look for a complete linear description 
of in the initial variable space with only 0(n) variables. In that case, even 

if this complete linear description needs an exponential number of constraints, 
we do not need to add them all (a priori) to solve a particular instance. If we 
add the constraints using a separation algorithm, then at most 0(n) variables 
and constraints are needed to describe any particular extreme point of . 
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This is typically what happens when branch and cut is used instead of branch 
and bound. 

In order to identify a complete linear description of conv{X^^^) in the initial 
space of variables, we first need to answer to the following question. 

Question 3: How to find valid inequalities for the set ? 

To answer to that question, we describe the typical way a researcher could think 
of in order to identify classes of valid inequalities. 

One approach is to find numerically the complete description of small in- 
stances, and to generalize the results to classes of valid inequalities for arbitrary 
data and problem dimension. 

Another approach is to solve the LP relaxations of some instances, and to 
find valid inequalities cutting off the non-integral optimal solutions so obtained, 
and then to generalize the results to classes of valid inequalities for arbitrary 
data and problem dimension. This approach gives also some intuition for the 
separation problem. 

The example and the procedure used to find valid inequalities are taken from 
Wolsey |83]. Consider the fractional solution pictured in Figure Hand obtained 
by solving the linear programming relaxation of the natural formulation J29I1 - 
(|32j of some specific instance of ULS. 




x=.y= x=,y= x=,y= 

7 |0.25 4 |0.19 6 |0.35 

© © © 

1 1 I 

7 4 6 




Fig. 7. A fractional solution(n = 6) 



Observation: As in the dynamic program, the solution decomposes into inter- 
vals of periods between successive points with s* = 0. Intervals in which all the 
y variables are integral, like interval [5, 6] in Figure 0 cannot be cut off because 
they correspond to integer (partial) solutions in conv{X^^^). Therefore, we look 
at intervals in which the corresponding y variables are fractional, like interval 
[2,2] in Figure 0 and try to cut-off the corresponding point. 

This can be done either by generating valid inequalities using known cutting 
planes, or by generating new and specialized valid inequalities for the problem 
at hand. We illustrate these two approach types in turn on this very simple 
problem and on the fractional interval [2,2] from Figure 0 
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Valid Inequalities Using Known Cutting Planes. The fractional interval 
[2,2] from Figure [7] corresponds to a fractional solution of the single node or 
single period flow model defined by 



{{x2, t/2) Sij S 2 ) S R\ X X : Si + X 2 — d,2 + S 2 , X 2 < d2,„j/2} (62) 

where d 2 ,n = 28 and ^2 = 7 in our toy example. 

The well known flow cover inequalities have been defined for such single node 
flow models, see Padberg, Van Roy and Wolsey m- Let the cover be C = {X 2 }, 
with excess capacity A = d, 2 ,n — <^ 2 - The resulting flow cover inequality is 



X 2 < d 2 — (d 2 ,n — A)(l — 2 / 2 ) + S 2 , Or equivalently (63) 

X2 <d2U2 + S2- (64) 

which gives X 2 <1 1)2 + S 2 in our example. 

This inequality is violated by the current fractional point from Figure |7l and 
it can be easily generalized to 



Xt < dtUt + St forall t (65) 

for arbitrary demand data and time period. 



New Valid Inequalities. Consider again the fractional interval [2, 2] from 
Figure |3 In any feasible solution, if there is no production in period 2 (which 
surely occurs if j /2 = 0), then the demand ^2 for period 2 must be produced 
earlier and must be contained in the entering stock si. 

Therefore, a logical implication is ?/2 = 0 implies si > ^ 2 - This implication 
can be converted into the valid linear inequality Si > c? 2 (l — 1 / 2 )? which can be 
interpreted as if 2/2 = 0 then si > c? 2 , and if j /2 = 1 then si > 0. 

This inequality is violated by the current fractional point from Figured and 
can be easily generalized to 



st-i > dt{l - yt) foralH 



( 66 ) 



for arbitrary demand data and time period. 
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Reoptimization and Generalization. After some violated valid inequalities 
have been identified, they can be added to the initial formulation, and the re- 
sulting tighter linear formulation solved. 

Using the flow balance equality, st-i + Xt = dt + St , one observes that 
the 2 classes of valid inequalities found (|6^ and (l66t are equivalent. We add 
these inequalities for t = 2, 3, 4 to the formulation, reoptimize, and get the 
new non-integral solution pictured in Figure |8] Again, to And new violated valid 
inequalities, it suffices to look at the solution for periods 1 up to 3 because the 
partial solution for periods 4 to 6 is integral. 



y= 

1 



1 1 . 6 ?! 

O 

* 



s= 



5.67 



x= ,y= 
5.33 1 0.1 9 

©i 

* 



© 
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6 

© 



x=| y= 
1 

© 

i 



s=. 



8 



© 

* 



6 7 4 6 3 8 

Fig. 8. A fractional solution after some cuts (n = 6) 



By the same type of logical arguments, one can derive the valid inequality 
Si > <^2(1 — 2/2) + ^3(1 — 2/2 — 2/3)) which says that si must carry over c?2 if 2/2 = 0 
and must also carry over ds if both 2/2 = 0 and 2/3 = 0. 

This inequality is violated by the current fractional point from Figure |H1 and 
it can be generalized to 



i 

st-i > ^ 4(1 - 2/t 2/fc) for 2 < t < 2 < n (67) 

k—t 



Complete Linear Description. Once classes of valid inequalities have been 
identified, the next natural questions to ask are, in sequence. 

Question 4: Are the valid inequalities facet defining for conv{X^^^) ? 
Question 5: Do we have a complete linear description of conv{X^^^)? 

We will not describe here the various techniques that can be used to prove that 
some valid inequalites are facet defining, and that some valid inequalities suffice 
to describe the convex hull of a problem. For a general presentation of these 
topics, we refer the reader to Nemhauser and Wolsey m- We will rather state 
the main results for the ULS problem, with adequate references for their proofs. 

Using the flow balance constraints to eliminate the stock variables (i.e. St = 
[Sfe=i — c^i,t]), the class of valid inequalities fl571l is rewritten as 
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t-l I 

^ ^ dk,i Vk > dij for 2<t<l <n ( 68 ) 

k—1 k—t 

These inequalities fl^ do not suffice to describe conv{X^^^), but they suffice 
to solve most practical instances without any branching, i.e. the solution of the 
linear relaxation of (l28l) -l l32D . augmented with the inequalities (IMt . is almost 
always integral. We will see later why, and in which sense, this is the case (see 
Section iTSt . To obtain a linear description of conv{X^^^), the inequalities (IMT l 
have to be further generalized, as shown in the following Theorem. 

Theorem 5 Assuming d* > 0 for all t, conv{X^^^) is described by 



X! ^ > di/ for 1 < Z < n, S' C L = {!,• • • ,;} (69) 

iGi\S ieS 

yi = 1 (70) 

n 

'^Xi = din (71) 

i=l 

Xi>0, 0 <yi <1 for all i (72) 



Barany, Van Roy, Wolsey give a primal-dual proof of Theorem |5] Pochet 
and Wolsey give a direct proof using a technique due to Lovasz [40) . 

Separation. We have obtained a complete linear programming formulation 
in the original variables (x,y) that solves ULS. Unfortunately this formulation 
contains an exponential number of so-called {I, S) inequalities I I69I) . In a branch 
and cut or cutting plane approach, instead of adding all these inequalities a priori 
to the formulation, they are added in the course of optimization when they are 
needed. We then need to solve the following separation problem. 

We define X^^^ as the linear relaxation of the natural formulation (ESl-(|32) 
of ULS. 

Separation Given (x*,y*) G X^^^\ 

— Either we find an {l,S) inequality violated by {x*,y*), 

— or we prove that all (l,S) inequalities are satisfied by (x*,y*). 

To find the most violated {I, S) inequality for fixed I G {1, • • • , n}, it suffices 
to test whether or not '^*(2/*) < du. 

— If this holds, then the (l,S*) inequality with S* = {i G L ■. dug* < x*} is 

the most violated inequality for the given value of 1. 

— Otherwise, there is no violated (Z, S) inequality for the given value of 1. 

By enumerating all possible values of Z, we obtain an O(n^) separation algo- 
rithm for the (Z, S) inequalities. 



82 



Y. Pochet 



3.5 Single Item Variants of Uncapacitated Lot-Sizing 

Using the same approach as for ULS, we describe here very briefly several ULS 
variants that have been studied in the literature, and for which complexity and 
reformulation results are available. 



Start-up Costs. We consider first the single item lot-sizing problem with start- 
up costs. This model is identical to ULS, with an additional startup cost incurred 
in the first period of a production sequence or batch. More precisely, the startup 
cost is incurred in period t, if there is a setup in period t, but not in period t—1. 
To introduce this startup cost model into the formulation, we define additional 
binary variables Zt taking the value 1 when there is a startup in period t. This 
leads to the formulation 



min '^{ptxt + ftVt + htst + gtzt} 




(73) 


St-i + Xf = dt + St 


for all t 


(74) 


So = Sn = 0 




(75) 


Xt < Myt 


for all t 


(76) 


yt> zt>yt- 2 /t-i 


for all t 


(77) 


xt,st > 0, yt,zt G {0,1} 


for all t 


(78) 



where constraints dZZ) impose a startup in period t if there is a setup in t, but 
not in t — 1, and also force a setup in period t if there is a startup in period t. 
We define ULSS as the optimization problem (TTlTIl - dTHll . 

Magnanti and Vachani show that the decomposition property of optimal 
solutions given in Theorem H] still holds. This allows one to define a dynamic 
programming algorithm for solving ULSS, as in van Hoesel in], with a direct 
implementation running in O(n^). Again, more effcient implementations run in 
0{n log n), see van Hoesel [7T] and Aggarwal, Park [3|. 

The techniques used to generate compact linear formulations for ULS in 
Section lT3l can be extended to ULSS. In particular, Wolsey [STj provides a facility 
location type formulation and van Hoesel, Wagelmans and Wolsey [72] provide 
a proof that its linear relaxation solves ULSS, Rardin and Wolsey j6l] prove 
that the muticommodity linear formulation solves ULSS, and the technique of 
Martin [44] can be used to generate a shortest path linear formulation from the 
dynamic program solving ULSS. All these compact linear formulations involve 
0{iT?) variables and constraints. 

A class of valid inequalities can easily be obtained starting from the {I, S) 
inequalities of ULS. For example, starting with the {I, S) inequality si > 
£^ 2(1 - 2 / 2 ) + ^ 3(1 - 2/2 - 2 / 3 ) + £^ 4(1 - 2/2 - 2/3 - 2 / 4 ), which is also valid in 
the presence of startup costs, one can derive the stronger valid inequality si > 
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^2(1 — 2/2) + <^3(1 — U2 — Z3) + di{l — y2 — Z3 — Z4). This inequality says that 
Si must carry over c?2 if 2/2 = 0 , must also carry over ds if 2/2 = 2:3 = 0 (which 
implies that j/3 = 0, and thus there is no production in periods 2 and 3), and 
must carry over ^4 when 1/2 = Z3 = Z4 = 0 (which implies 2/3 = 2/4 = 0). 

This inequality can be generalized to 



st-i > dt(l - yt) 

i 

+ X! dk{^-Vt-Zt+i Zk) for 2 < t < 2 < n. (79) 



However, as for ULS, the inequalities (17^ do not suffice to obtain a linear 
description of the convex hull of solutions, but they suffice to solve most practical 
instances by linear programming, without any branching (see Section [iT31l . A 
further generalization of inequalities dzni proposed by van Hoesel, Wagelmans 
and Wolsey m is proved to define the convex hull of feasible solutions to ULSS 
in the initial variable space. This formulation contains an exponential number 
of constraints that can be separated in time of 0{v?). 

Constant Capacity. We consider now the single item constant capacity lot- 
sizing problem (CCLS) defined by dHQJ-dHll)- This constant capacity model is 
quite realistic and occurs often in practice when the production, inventory and 
demand are expressed in equivalent-time units. 



min '^{ptXt + ftVt + htSt} 




(80) 


St-i Xt = dt -\- St 


for all t 


(81) 


So = Sn = 0 




(82) 


Xt < Cyt 


for all t 


(83) 


xt,st >0, yt € {0, 1} 


for all t 


(84) 



The formulation of CCLS is identical to the natural formulation of ULS, 
except that constraint dSS]) expresses now a limited capacity when C < di„. The 
capacity limit C is constant over time. 

Florian and Klein m propose a dynamic programming programming algo- 
rithm for CCLS running in 0 ( 71 "^) time, and van Hoesel, Wagelmans [73] reduce 
this time to 0{n^). The technique of Martin |33] can be used to generate a short- 
est path linear formulation from the dynamic program solving CCLS with O(n^) 
variables and constraints. Pochet and Wolsey m give a linear programming for- 
mulation with 0{n^) variables and constraints solving CCLS that corresponds 
to a shortest path sequence of regeneration intervals, but cannot be derived from 
the dynamic program solving CCLS. 
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Pochet and Wolsey |56] describe also a new class of valid inequalities for 
CCLS. This class was generalized to other problems than CCLS, and a new 
procedure to obtain these inequalities by mixing the mixed-integer rounding 
inequalities of Nemhauser and Wolsey [48] is described in Gunlunk and Pochet 
p6| . We illustrate here this class of inequalities with an example. The reader is 
referred to the above papers for a formal definition of these inequalities. 

Consider the instance of CCLS with C = 10, n = 5 and demand d = 
(1, 7, 4, 2, 3) represented in Figure [^ First, by summing the flow conservation 
constraints (1^ from period 2 to some period ^ > 2, by replacing the production 
variables Xt by their variable upper bounds Cyt, one gets the following starting 
or base (valid but redundant) inequalities 



Si + Wy2 > 7, (85) 

Si + 10i/2 + 10?/3 > 11, (86) 

Si -I- 10i/2 + lOys + 10y4 + lOys > 16. (87) 



C = 10 C = 10 C = 10 C = 10 C = 10 

j 1^2 1^4 1^5 

Q -Q <D 

* 1 i 1 * 

1 7 4 2 3 

Fig. 9. A CCLS example(n = 5) 

Then, standard mixed-integer rounding (MIR) inequalities [4 — Martin] for a 
general description of MIR inequalities) can be constructed from these base 
inequalities. This gives 



Si > 7(1 - y2), 


(88) 


Si > 1(2 - y 2 - ys), 


(89) 


Si > 6(2 - V2 - ys - yi - ys)- 


(90) 



Finally, the MIR inequalities can be mixed by taking differences of terms at the 
right hand side. This gives, for example, the new inequalities 



Si > 1(2 - y 2 - ys) + (6 - 1)(2 - y 2 - ya - yi - ys) -k (7 - 6)(1 - y 2 ), (91) 

Si > l(2-y2-y3) + (7-l)(l-y2), (92) 

Si > 6(2 - y 2 - ys - yi - ys) + (7 - 6)(1 - y 2 ). (93) 
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For example, the second of these inequalities can be interpreted as follows. As 
(I23 = 11, if there is only one production period in periods 2 and 3 (i.e. 2/2 +2/3 = 1, 
so a capacity of 10 is available in periods 2 and 3), then si must carry over at 
least one unit in order to satisfy demand in periods 2 and 3. But if, in addition, 
2/2 = 0, then si must carry over the whole demand d,2 =1 (instead of only 1 unit 
when 2/2 = 1 and 2/3 = 0). 

Such mixed MIR inequalities do not suffice to describe the convex hull of 
solutions for CCLS by linear inequalites only, but they suffice to solve most 
practical instances without branching (see Section IfTKll . Examples of facet defin- 
ing inequalities that are not of the type CT- ipn are given in Pochet and Wolsey 
[I5tij . However, in this case, no complete linear description of the convex hull of 
solutions is known for CCLS. 

Leung, Magnanti and Vachani m, Pochet P], Loparic, Marchand and 
Wolsey m define and test valid inequalities for the case where the capacities 
may vary over time. 



Backlogging. Another variant of ULS is to allow for backlogging. To model 
this situation, we introduce a new variable r^, for each period t, representing 
the backlog of demand at the end of period t. So, rt is the demand from periods 
{1, • • • , t} that will be delivered late (in some period in {t + 1, • • • , n}). A classical 
formulation for the uncapacitated lot-sizing problem with backlogging (BLS) is 

dM}-(EHl 



min + ftVt + htst + gtrt} 




(94) 


(st-i - rt-i) +xt = df + (st - rt) 


for all t 


(95) 


0 

II 

II 

s 

II 

0 

II 

0 




(96) 


xt < Myt 


for all t 


(97) 


Xt,st,rt >0, yt € {0,1} 


for all t 


(98) 



where the flow conservation constraints (|M|l are adapted to take into account 
these backlogging variables. In each period t, rt represents a flow coming from 
period t -|- 1 to period t (to satisfy artificially some demand in periods at or 
before t) . Constraints impose that there is no initial and final inventory and 
backlog. It is assumed in [HU that each unit of product backlogged at the end of 
period t costs gt- 

Zangwill [85] describes a dynamic programming programming algorithm for 
BLS running in O(n^) time, and Wagelmans, van Hoesel, Kolen [T^ obtain an 
implementation in 0{n log n) time. Pochet and Wolsey [54j provide facility 
location, multicommodity and shortest path linear formulations involving 0{v?) 
variables and constraints. 
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Pochet and Wolsey |54] describe a new class of valid inequalities for BLS. 
This class, that can also be applied to other problems than BLS, gives a way 
to generalize the class of valid inequalities defined in Van Roy and Wolsey m 
for uncapacitated fixed charge network flow problems. We illustrate this class of 
inequalities with a simple example. The reader is referred to the above papers 
for a formal definition of these inequalities. 

Consider an instance of BLS with n > 5, and represented in Figure 1101 
First we define a subset of the inventory and backlog variables, for example 
{si, S3, T2, T4}, and then we find logical conditions under which some demand has 
to flow through the selected variables. For example, if 2/2 = 0 then Si + T2 > d,2, 
giving the base valid inequality si + r2 > <^2(1 — 2/2)- Similarly, one can obtain 
the base valid inequalities si +T4 > ^3(1 — j/2 — ?/3 — 2/4) and S3 + r4 > ^4(1 — 7/4). 
Finally we put all these inequalities together to obtain the valid inequality 



Si + S3 + r2 + T4 > ^2(1 - 2/2) (99) 

+ dsil - 2/2 - 2/3 - 2/4) (100) 

+ d4(l-2/4) (101) 




Again, the inequalities illustrated above do not suffice to obtain a linear 
description of the convex hull of solutions to BLS (such a description is unknown 
for general objective functions), but they suffice to solve most practical instances 
of BLS without branching (see Section flUil) . 

Wagner- Whitin Costs. We analyze in this Section the special case of the pre- 
viously studied lot-sizing problems where the production, inventory and back- 
logging costs satisfy the Wagner- Whitin condition. An instance of ULS, ULSS, 
CCLS or BLS is said to have Wagner- Whitin costs if pt + ht > pt+i and 
Pt+i + 2/t > Pt, for all t. This means that the unit production, unit inventory and 
unit backlogging costs are non speculative in the sense that, 

— if a demand dt is satisfied from production in periods t or before, then this 
production occurs as late before t as possible (with respect to setup decisions 
and capacity available) because producing and stocking in earlier periods 
costs more than producing later (pt + ht > Pt+i), 
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— if a demand dt is satisfied from production in periods t or after, then this 
production occurs as early after t as possible (with respect to setup deci- 
sions and capacity available) because producing and backlogging from later 
periods costs more than producing earlier {pt+i + gt hPt)- 

The implication of this non speculative cost condition is that 

— the complexity bound for the optimization of the subproblems ULS, ULSS, 
CCLS and BLS are reduced, 

— new and smaller (in number of variables and constraints) linear reformula- 
tions can be derived based on the fact that there always exists stock minimal 
and backlog minimal solutions (i.e. each stock and backlog variable is set to 
its minimal minimal possible value with respect to setup decisions and ca- 
pacity constraints) to these problems, 

— fewer constraints are needed in the complete linear description of the convex 
hull of feasible solutions in the initial space, and the complexity bounds of 
the separation algorithms are reduced. 



Federgrun, Tsur m , Wagelmans, van Hoesel, Kolen and Aggarwal, Park 
10. Furthermore have shown that the dynamic programs solving ULS and ULSS 
can be implemented in 0{n) when the cost function satisfies the Wagner- Whitin 
assumption. 

For the linear reformulation, in the case of ULS, we have seen that inequalities 
(1^ do not suffice to obtain a linear description of conv{X^^^), but they suffice 
to solve ULS by linear programming with a Wagner- Whitin objective function. 
This means that in this case there always exists an optimal solution with y 
integer to the linear relaxation of ([^ - (1^21) augmented by (IBHl) . Moreover, this 
Wagner- Whitin condition is a very natural assumption that is almost always 
satisfied by the objective functions encountered in practice. This is the reason 
why it generally suffices to add the inequalities (l6^ to solve most ULS problems 
by LP. 

Similarly, there always exists an optimal solution with y integer to the linear 
relaxation of problem CH-CZHD. plus (17911 . Therefore, adding inequalities (I79|l 
suffices to solve ULSS by LP. Similar results hold for problems CCLS and BLS. 

We summarize in Table |2]the results about the linear reformulations of prob- 
lems ULS, ULSS, CCLS and BLS. See Pochet and Wolsey jBT] for a detailed 
description of the formulations and separation algorithms under the Wagner- 
Whitin cost assumption. We indicate in the table the sizes and complexity 
bounds for general cost functions, and the size/complexity reduction obtained, if 
any, with Wagner- Whitin costs (after the ~^ww sign). In the table, ?? indicates 
that no result is known, to the best of our knowledge. These results are extended 
to the uncapacitated lot-sizing problem with backlogging and startup costs in 
Agra and Constantino |1]. 
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Table 2. Size/complexity reductions with Wagner- Whitin costs 





Extended 
linear formulation 


Initial space 
linear formulation 


Uncapacitated (ULS) 

number of variables 
number of constraints 
separation complexity 


0{n^) 

0{n^) 


0(n) 

0(2") -^ww 0{'n?) 
O(n^) 


Startup costs (ULSS) 

number of variables 
number of constraints 
separation complexity 


O(n^) 

0{n^) 


Oin) 

0(2") -^ww 0{'n?) 
O(n^) —>ww O(n^) 


Constant capacity (CCLS) 

number of variables 
number of constraints 
separation complexity 


0{n^) —^ww 0{n^) 
0{n^) -^ww 0{n^) 


0(n) 

?? —^ww 0(2") 

?? — ^ww 0 {tl^ log n) 


Backlogging (BLS) 

number of variables 
number of constraints 
separation complexity 


0{n^) -^ww 0{n) 
O(n^) 


0{n) 

?? -^ww 0(2") 
?? ~^ww 0{n^) 



Profit Maximization. As a last variant of the ULS problem, we consider the 
profit maximization problem PLS formulated as 



max E 

t 



'^{ptXt + ftVt + htSt} 




(102) 


St_i + Xt = Vt + St 


for all t 


(103) 


Sq — Syj — 0 




(104) 


Xt < Myt 


for all t 


(105) 


0 <vt <Ut 


for all t 


(106) 


Xt,st >0, ytG {0,1} 


for all t 


(107) 



where the variable Vt, representing the amount sold in period t, replaces the 
fixed demand in the flow balance constraint (uns), and is bounded from above by 
the maximum forecasted market potential Ut in constraint (llObl) . The objective 
function (fTU^ represents the profit contribution to be maximized and defined 
as the product of sales at a price tt* per unit sold in period t, minus production 
and inventory costs. 

Such a problem arises typically as a subproblem in profit maximization multi 
item production planning problems, where there are capacity linking constraints 
between the items. For each single item subproblem, the production costs in 
each period contain implicitely, or even explicitely when the problem is solved 
by decomposition or Lagrangean relaxation, the opportunity cost of the capacity 
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used for production. The problem consists in allocating the capacity to the items 
in order to maximize overall profit contribution. 

Loparic, Pochet and Wolsey |38j analyze the optimization algorithms and 
linear formulations for problem PLS with additional lower bounds on stocks to 
model required safety stock levels. They show that the problem can be solved in 
O(n^) by dynamic programming, they propose linear formulations with O(n^) 
variables and constraints, and they provide a complete linear description with 
an exponential 0 ( 2 ") number of constraints of the convex hull of solutions in 
the initial variable space. Again the problem can be solved by LP with only 
a subset of these constraints - but still exponentially many - when the cost 
function satisfies the Wagner- Whitin assumption. 

We only illustrate here the type of stock minimal inequalities needed to solve 
PLS by LP in the presence of such Wagner- Whitin costs. The inequality 

Si > (u2 - O22/2) + (^^4 - 04(1/2 + 2/3 + 2/4)) (108) 

is valid for the convex hull of solutions to (unsD-diniD and it is a slight generali- 
sation of the inequality 



Si > (^2 - ^22/2) + (d4 - ^4(2/2 + 2/3 + 2/4)) (109) 

needed to describe conv { X ^^^). The inequality (1 108(1 expresses that if j/2 = 0, 
then a;2 = 0 and Si must cover the sales level V2, and if 2/2 = 1 then there is 
no restriction on si because V2 < U2 can be covered by X2- Therefore, Si > 
{v2 — O22/2) is valid. Similarly, si > V4 when there is no production in periods 2 , 
3 and 4 , that is when 2/2 + 2/3 + 2/4 = 0. 



4 Formulations of Capacitated Models 

We analyze in this section the main classes of capacitated production planning 
models. We first classify the models, and then describe in more details the basic 
mathematical programming models and formulations for the two main model 
classes, the big buckets and small buckets models. 

4.1 Classes of Capacitated Models 

The first and important shortcoming of the classical MRP approach in discrete 
manufacturing is to decompose the planning procedure in two phases. First the 
uncapacitated planning problem is solved, to obtain production or purchase plans 
for all items in the product structure. Then the capacity problems (overloads) 
created by the production plans are resolved by smoothing the load profiles, i.e. 
by shifting some of the production lots forward or backward in time. This lack 
of integration between the uncapacitated planning phase and the capacitated 
post-optimization phase explains that there is no guarantee to obtain a feasible 
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production plan if one exists, and explains why the production plans obtained are 
suboptimal in terms of cost. There is thus a need to develop capacitated models 
allowing one to obtain feasible and optimized production plans, and providing 
the needed flexibility and tools to adapt the production plans to demand and to 
market requirements. 

There are two main classes of capacitated models that can be distinguished 
based on the size, or time length, of the production periods. These model classes 
are known as the big buckets and small buckets models, and correspond to quite 
different planning environments and problems that we describe now. 

Discrete time planning models are models where the planning horizon is 
divided into discrete time periods, and where all events - production, sales, . . . - 
occuring during a period are aggregated in time as if they occur all at the same 
time. All the models that we have studied so far are of this type. 

Big buckets planning models are discrete time models with large time pe- 
riods where the typical aim is to take global decisions about the resources to 
be used and to acquire, and about the assignment of products to facilities or 
departments, rather than to plan the production to meet some precise and firm 
short term customer orders, or to control the short term output from the shop 
floor. Therefore, there is usually no need for a detailed representation of time. 
Moreover, and very often, a more detailed time representation would not make 
sense because the required detailed information (e.g. detailed sales forecasts) 
would not be available on time. 

Such models are used for solving tactical medium term resource planning 
problems. The size of the time period is usually taken as one week, or one 
month, depending on the particular problem, and the planning horizon ranges 
from a few weeks to months, up to one year. 

Small buckets planning models are discrete time models with small time pe- 
riods where the aim is to take detailed decisions about the materials flow in the 
shop floor, the sequence of production lots on machines to control short term 
output and capacity usage, to verify that technical precedence constraints be- 
tween lots can be satisfied. A detailed time representation is needed to represent 
accurately these events. 

Such models are used for solving operational short term materials planning 
problems. The size of the time period is usually taken as one week, or one day, 
down to one hour, depending on the particular problem, and the planning horizon 
ranges from a few weeks down to one day. In some cases, similar models have to 
be used for longer term problems with larger time periods when, for instance, 
the detailed sequence of products has a significant impact on the medium term 
capacity utilization. This is for example often the case in the process or chemical 
industry where change-over times may represent days of production lost. 

For simplicity, we will consider that big buckets models are discrete time 
models where there is no need to represent the sequence of events inside a period. 
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and between the different periods. Small buckets models are models where it is 
required to represent the sequence of events from one period to the other. 

4.2 Big Buckets Models: Capacitated Lot-Sizing 

Basic Big Buckets Multi-item Model. The master production scheduling 
model <m-m formulated in Section E2l also known as the capacitated multi- 
item capacitated lot-sizing problem, is the basic example of big bucket model. 
We formulate here the simplified version without setup times, called MICLS, 
that has been widely studied in the literature. 



™ + ftVl + h\s\} 

i t 




(110) 


sl_, + xl=dl + sl 


for all f, t 


(111) 


< Clvl 


for all f, t 


(112) 


a^l < Lt 


for all t 


(113) 


xlsl> 0, y\ e {0,1} 


for all i,t 


(114) 



where constraint ()112p represents a single item capacity constraint and constraint 
nild|l is the multi-item capacity linking constraint. Note that if Cl = Lt/ a'', then 
constraint (|112p is just used to define the setup variable y\ without imposing 
another capacity constraint than the linking constraint (llldlh 

Several optimization approaches have been proposed and tested for problem 
MICLS. They are based on the single item uncapacitated ULS subproblems, and 
use the optimization and reformulation results surveyed in Section |2l Thizy and 
Van Wassenhove [67] , Chen and Thizy [TO] test and compare several Lagrangean 
relaxation schemes, Eppen and Martin |19] solve the problem by branch and 
bound using extendend linear formulations for the ULS subproblems, Barany, 
Van Roy and Wolsey 0 and Pochet, Wolsey [55] use a cut and branch approach 
(cuts are added only before the branch and bound enumeration), Constantino 
na solve MICLS by a branch and cut approach. 



Valid Inequalities Using Known Cutting Planes. As we did in Section IHTTl 
we illustrate how to identify classes of valid inequalities for problem MICLS using 
known cutting planes. This approach can be used to tighten the formulation and 
solve such problems using branch and cut approaches. 

The first idea is to use the mixed integer rounding (MIR) inequalities from 
Nemhauser and Wolsey jlS] [ — Martin] . Marchand and Wolsey [13| show how to 
generate MIR inequalities from 0-1 knapsack problems with continuous variables. 
To use this idea, it suffices to construct relaxations of MICLS in the form of 0-1 
knapsacks with continuous variables. 
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We illustrate this approach with an example. First we construct a single item 
knapsack relaxation for item i by summing the flow conservation constraints 
flllll l over periods t = k, - ■ ■ ,1, replacing s] by its lower bound 0, and replacing 
xl by its upper bound min[C(, for t = This gives the following 

knapsack relaxation with one continuous variable. 

i 

4-1 + > 4 / ( 115 ) 

t—k 

4-1 >0, 4e{0>l} for t = k,---,l (116) 

where as before we use the notation dh = d) • 

Suppose now that we have solved the linear relaxation of MICLS and we 
want to cut off the fractional solution obtained and illustrated in Figure [TT] for 
one item i and for periods 2 up to 5. 




Fig. 11. A fractional solution for one item in MICLS 



The corresponding knapsack relaxation is (where we have dropped the item 
i superscript for simplicity) 

Si + 10?/2 + 9 i/ 3 + 7y4 + 3 i/5 > 14, (117) 

which can be relaxed into (where we consider variable 1/5 as continuous) 

(si + 31 / 5 ) + 10 ( 1/2 + ?/3 + 2 / 4 ) > 14. (118) 

We obtain the classical MIR inequality 

^ + ^2/5 +222 + 2/3 + 2/4 > 2 (119) 

by dividing (111811 by 10 and rounding. The inequality HI 1911 is violated by the 
fractional point in Figure [TTl 

Another idea is to use the well known flow cover inequalities that have 
been defined for single node flow models in Padberg, Van Roy and Wolsey m 
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[ — > Martin] for a general description of flow cover inequalities). Again, it suf- 
fices to construct single node flow relaxations, and to generate the corresponding 
flow cover inequalities. 

Consider first the single node flow relaxation defined for a single item i over 
several consecutive periods, and obtained by summing the flow conservation 
constraints cm) over periods t = k, ■ ■ ■ ,1, replacing by its lower bound 0. 
This gives the following single node flow model. 



i 

+ s] ( 120 ) 

t—k 

xl<Tam[Cl,dh]yl + s] for I (121) 

s] > 0, > 0, yj e {0, 1} for t=k,---,l (122) 

where we have changed the variable upper bound constraint in CUD by adding 
the flow s] at the right hand side, allowing us to reduce the coefficient of yl- 

Again, suppose that we are given the fractional solution illustrated in Figure 
m The corresponding single node flow relaxation is 



X2 + X3 + X4 + X5 < 14 (123) 

X2 < lOj /2 + S5, X3 < 9y3 -I- S 5 , X4 < 7 j /4 -|- S 5 , X3 < Sy^ + S 5 (124) 
By projecting S 5 to zero, we obtain the classical flow cover inequality 

+ X4 < 14 - (10 - 3)(1 - 2/2) - (7 - 3)(1 - 2/4) (125) 

using the cover C = {2, 4} and the excess capacity of the cover A = 3. We obtain 
Anally the valid inequality 



X2 + X4<IA - (10 - 3)(1 - 2/2) - (7 - 3)(1 - 2/4) + S5 (126) 

by adding the flow S5 at the right hand side of the inequality cm to lift back 
the variable S 5 , i.e. to make it valid even when S 5 > 0. The inequality ll 1 25h is 
violated by the fractional point in Figure [TTJ 

Consider next another single node flow relaxation obtained by taking the 
linking capacity constraint cm for a given period t, and the single item capacity 
constraints dm) for all items in the same period t. This gives the following single 
node flow relaxation 



^ ( 127 ) 

i 

{cdxD < {a^CDyl for all i 
{a'^xD >0, yl G {0, 1} for all i 



(128) 

(129) 
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where the continuous flow variables are the variables {a'^xD with an indivudual 
upper bound of (a^Cl). 

Suppose now that we have solved the linear relaxation of MICLS and we 
want to cut off the fractional solution obtained and illustrated in Figure [T^ for 
all items and for a given single period. 



C = 5 




C = 4 



:= 1 y= 
4 1 1 


C = 3 

x= 1 y= 
3 1 1 


6 





Fig. 12. A fractional solution for one period, 3 items in MICLS 



The corresponding single node flow model is (where we have dropped the 
time index t for simplicity) 



x^ +2x^ + 3x^ < 20 (130) 

< 5y^, 2x^ < 2(4j/^), < 3(3j/^). (131) 

We obtain the classical flow cover inequality 

x^ + 2x^ + 3x^ < 20 -(5 - 2)(1 - 

_(8-2)(l-y2)-(9-2)(l-y3) (132) 

using the cover C = {1,2,3} and the excess capacity of the cover A = 2. This 
inequality is violated by the fractional point in Figure 1121 



Big Buckets Multi-item Model with Setup Times. The classical and 
most studied MICLS problem as formulated in (fTTUl) -(fmi) does not contain any 
setup time, but setups are only taken into account through a penalty cost in the 
objective (lliop . 

Therefore, this setup cost // must contain the opportunity cost of the setup 
activity for item i in period t. The opportunity cost is defined as the additional or 
marginal cost incurred because of the setup activity. It is very difficult to estimate 
a priori because it really depends on problem data, on capacity utilization and 
capacity available and productivity in period t. Hence, because of the difficulty 
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to estimate the setup cost , the true optimality of the solution to MICLS in not 
guaranteed. 

Moreover, the setup activity consumes part of the capacity available, and 
influences the productivity. Therefore, it is not even guaranteed that the solution 
obtained from MICLS is really feasible because the capacity consumed by the 
setup activity is neglected. 

In order to improve the reliability of the solution from MICLS, both in terms 
of feasibility and optimality, it is important to model more accurately the pro- 
ductivity, i.e. the capacity used as a function of the production lot sizes. In the 
presence of economies of scale in production, a better model is to introduce 
directly in the model the capacity used by the setup activity - and not only 
implicitely through the setup cost -. The linking capacity constraint in period t 
should then be modelled as 



< Lt (133) 

i i 



where /?* represents the capacity used by the setup of item i, and is often called 
the setup time of item i. We call MICLSS the multi item model MICLS where 
the capacity constraint with setup times C3H!) replaces the original capacity 
constraint ll 1 1 3h . 

Very few optimzation methods have been proposed and tested for MICLSS. 
Trigeiro, Thomas, McClain and Diaby, Bahl, Karwan, Zionts m test La- 
grangean relaxation heuristic approaches - with performance guarantee - based 
on the solution of the ULS subproblems and the relaxation of the capacity linking 
constraints (im . 

To tighten the basic formulation of MICLSS beyond the complete description 
of the ULS subproblems, other relaxations or subproblems have to be studied. 
Goemans |23] proposes generalized flow cover valid inequalities to tighten the 
formulation of the single period t multi item relaxation defined by 



< Lt (134) 

i i 

0 < < Myl for all i (135) 

yl S {0, 1} for all i (136) 

Constantino m provides a complete linear description of this single period 
relaxation when the setup times are constant for all items, i.e. /?* = j3 for all 
i. These tightened formulations have been tested on multi-item single-period 
subproblems, but no tests of these improved formulations on multi-item problems 
with setup times (MICLSS) have been reported on in the literature. 
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Another single period relaxation of MICLSS that has been studied tries to 
combine the linking capacity constraints and the demand satisfaction for each 
item. It is formulated as 










(137) 


sl_, + xl>dl 


for all i 


(138) 


0<xl< Myl 


for all i 


(139) 


> 0, yl G {0, 1} 


for all i 


(140) 



Miller, Nemhauser and Savelsbergh define cover and reverse cover valid 
inequalities for the one period subproblem Illd7ll - (I14()I1 . and tests their effective- 
ness in solving MICLSS instances by branch and cut. 



4.3 Small Buckets Models 

Small buckets models are used when changing the setup of the machines has a 
significant impact on the capacity used and/or on the production costs. Typically 
in these models, on each machine, we keep on producing the same item for a 
number of periods, and we only incur a setup cost/time when the setup of the 
machine is changed from one item to another. 

When the capacity used or the cost incurred during the setup modification 
do not depend on the sequence of items, but on the fact that the machine setup 
is changed, we speak about startup time or startup cost models. When the 
cost or time lost during the setup depend on the sequence of products, i.e. on 
the products produced before and after the setup modification, we speak about 
change over costs and times. 

We describe here the basic models, formulations and applications for such 
small buckets planning problems with startup costs and times, or change over 
costs and times. 



Basic Small Buckets Multi-item Model, One Setup per Period. The 

natural and classical single machine formulation to represent the setup, startup 
and change over decisions in a small buckets planning model allows one to setup 
only one item per period. If the setup of the machine is changed from period 
< — 1 to period t, this change is supposed to occur at the beginning of period t. 
This means that only one item can be produced in any period, the one for which 
the machine is setup in the period. 

The variable yl is used to represent the setup decision, it takes the value 1 
when the machine is setup to produce item i during the whole period t, and 0 
otherwise. 

The variable zl is the startup variable and takes the value 1 when the machine 
starts producing item i in period t, and was not setup for item i in period t — 1, 
and 0 otherwise. 
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The variable wl^ is the change over variable and takes the value 1 when the 
machine strats producing item j in period t and was setup for item i in period 
t — 1, and 0 otherwise. 

Independently on the production levels and the demand satisfaction part of 
the problem, the natural formulation used to represent these setup, startup and 
change over decisions is the following 



= l for alH (141) 

i 

4 > vl - yl-i for all i, t (142) 

> Ut-i + 2/t “ 1 foi' fol b (143) 

Zu y\ e {0,1} for all i,j,t (144) 

where constraint ll 14111 imposes to setup the machine for exactly one item per 
period, constraint (114211 forces the startup of item i in period t when y\ = 1 and 



yl_i = 0, constraint (11431) forces the change over from item i in period t — 1 to 
item j in period t when y\_i = yl = 1. 

To obtain a complete formulation of the small buckets planning problem, 
classical flow conservation constraints are used to model the satisfaction of de- 
mand, as well as capacity constraints. In a model with startup time 7* to start 
producing item i, the production capacity constraints are formulated as 



x\<Ulyl -4z\ foralH,t. 



(145) 



In a model with change over time 7-^* for switching from j to i, the production 
capacity constraint are formulated as 



4 < Ulvl - Y. for all i,t. (146) 



Several optimization approaches have been proposed in the literature for 
solving small buckets constant capacity over time (but different for each item) 
multi item lot sizing problems. Karmarkar and Scharge m propose, and test 
on small instances, a Lagrangean relaxation and branch and bound approach for 
the model with startup costs. Wolsey m solve problems with change over costs 
by branch and bound using the extended unit flow reformulation (see below) to 
model the setup decisions. Constantino m solve larger instances, with up to 
5 items and 36 periods, of problems with startup costs using a branch and cut 
approach. Vanderbeck ES] solve problems with startup times using a column 
generation approach combined with cut generation. 
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Unit Flow Reformulation. The formulation (|141l) - lll44l) used to model the 
steup decisions can be tightened and reformulated as a network flow problem 
consisting of sending one unit of flow from period 1 to period n. The flow mod- 
els the status of the machine through the time priods. This formulation is the 
tightest possible to represent the machine status decisions in the sense that the 
constraint matrix is totally unimodular. This formulation proposed in Karmarkar 
and Schrage m has been tested in Wolsey [81] . It is deflned by 



H 2/0 = 1 

i 




(147) 


wl^ = y\_-^ 

j 


for all i,t 


(148) 


4 + w4 = yi 


for all j, t 


(149) 


o 

II 

0 

1 

w 


for all j, t 


(150) 


yi e { 0 , 1 } 


for all i,j,t 


(151) 



where constraint (11471) defines the initial (period 0) status of the machine - 
usually these variables have a fixed value corresponding to the current status of 
the machine-. Constraint (11481 ) stipulates that if the machine was setup for i in 
period t — 1, then it must switch from i to some j (possibly equal to i) in period 
t. Constraint (11491 ) expresses that if the machine is setup for j in period t, then 
either it was already setup for j in period t — 1, or it starts producing item j in 
period t. Constraint ( II 5fl|l says that if the machine starts producing item j in 
period t, it must switch from a different product in period t — 1. 

Alternative formulations for modelling changeovers in production planning 
and scheduling problems are surveyed in Wolsey [8^ . 

Models with Two Setups per Period. Haase m has proposed a more 
flexible interpretation of the setup decisions. Suppose that j/J = 1 if the machine 
is setup for item i at the end of period t, and 0 otherwise. The natural formulation 
(I14H) - (I144> . or the unit flow formulation (I147I) - (I151L can still be used to model 
the sequence of admissible setup decisions. But with this interpretation of the 
setup variables, it is now possible to produce two items in each period t. These 
are the items for which the machine is setup at the end of period t — 1 and t. If 
any, the setup modification occurs now during period t between the production 
lots of the two products. So we only need to reformulate the capacity constraints. 
For instance, in the case of startup times, this gives 



xl < + y\ - wf) - i-z\ for all f, t 

^(a*xj) + 5](f 4) < U 

i i 



(152) 

(153) 
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Kimms |32] solves multi item small buckets problems with two setups per 
period using a Lagrangean relaxtion approach. 



Sequencing Models or m-Setups per Period. As a final extension of the 
small buckets models, we can mention the pure batch sequencing problem, or 
problems with many setups per period. In this case, we still need to know the 
sequence of products in order to compute the change over times, but we may 
produce in each period as many items or batches as feasible given the available 
capacity. 

The natural formulation fll41ll - (ll44l . or the unit fiow formulation (I147l) - (ll51ll . 
can still be used to model the sequence of admissible setup decisions inside each 
period, but with a different interpretation of the index t as the batch number 
rather than the period number. For instance, yj = 1 means that item i is the 
item produced during the batch. 

An alternative formulation in this case is to use the asymmetric traveling 
salesman problem (ATSP) to formulate the sequence of setup decisions inside 
each period. 

Several papers in the literature report on such applications and propose opti- 
mization approaches for similar problems. Kang, Malik and Thomas |30| propose 
and test a column generation approach to solve a multi period small buckets se- 
quencing problem with change over costs, and compare with a direct branch and 
bound approach using the ATSP formulation for the setup decisions. Batta and 
Teghem [8] solve a multi item single period problem with change over times by 
branch and bound, again using an ATSP formulation. 



5 Formulations of Multi-level Models 

We study in this section a general class of multi level production planning models. 
First we recall the description of the multi level lot sizing problem from Section 
Ea and analyze the classical level by level decompostion approach to show that 
it is suboptimal. Then we survey the optimzation approaches from the literature. 
Finally we present the echelon stock reformulation that plays an important role 
in all optimization approaches published so far. 



5.1 Multi-level Models and the Decomposition Approach 



We recall first the formulation of the general product structure capacitated multi- 
level lot-sizing model from Section 12.21 (see for instance Billington, McClain 
and Thomas M)- The product structures are classified into Series, Assembly or 
General structures, as represented in Figure U] 
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(154) 


i+xl_^i = [dl+ r^^xi] + sl 


for all i,t 


(155) 








VI 


for all i,t 


(156) 




for all t,k 


(157) 


xl, sl>0, ylG {0,1} 


for all i,t 


(158) 



A classical approach used for solving multi level planning problems is to de- 
compose the problem level by level into single level subproblems, solved sequen- 
tially from end-products to raw materials. The production plans at each level 
define the demand at subsequent levels. This is the approach used in MRP- type 
systems. This separate optimization at each level yields suboptimal production 
plans. This sequential planning procedure, and its suboptimality, are illustrated 
in the following simple example. 

Suppose that we have a 3 periods lot-sizing problem with 2 levels, one item 
at each level, and a serial product structure where one unit of the raw material 
{i = 2) is required to produce one unit of the finished product(* = 1). For 
simplicity, we assume that the lead time 7® in zero for each item. The external 
or independent demand for the finished product is = (10, 15, 20). There is no 
external demand for the raw material. There is a fixed ordering cost of = 200 
for the raw material, and fixed production cost of = 100 for the finished 
product, for all t. There is no unit production cost. The inventory cots is = 5, 
for all i, t. This planning problems corresponds to the fixed charge minimum 
cost network flow problem represented in Figure II 31 
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Fig. 13. A 2-level serial production planning example 
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In the classical decomposition approach, we plan first the production of the 
finished product in order to satisfy its external demand, by solving the corre- 
sponding single item ULS problem. The optimal solution with a cost of 275 is to 
produce 25 units in period 1, stock 15 from period 1 to period 2, and produce 
20 units in period 3. 

This production plan defines the internal or dependent demand for the raw 
material, 25 units of raw material have to be available in period 1, and 20 units 
have to be available in period 3. We then solve the ULS subproblem for the raw 
material, and find that it is optimal at a cost of 400 to order 45 units in period 
1, and stock 20 units from period 1 to period 3. Note that an alternate optimal 
solution for the raw material is to order twice (25 units in period 1 and 20 units 
in period 3), and avoid the inventory costs. 

So, we have obtained a global (finished product and raw material) production 
plan with a cost of 675. The optimal solution with a cost of 575 of this network 
flow problem is to order 45 units of raw material in period 1, transform these 
units directly into finished product in period 1, and stock the finished product to 
satisfy demand in periods 2 and 3. Furthermore, at the first step of the sequential 
procedure, if you enumerate all extreme solutions (those satisfying satisfying the 
decomposition property in Theorem [T]) of the ULS subproblem for the finished 
product, then the worst extreme solution (which is to produce the 45 units in 
period 1, and to stock to periods 2 and 3) of this subproblem should be selected 
in order to find the globally optimal solution. 

We have illustrated with this naive example, that it seems very difficult to 
optimize the production plans by solving the single level subproblems sequen- 
tially. Therefore, we need to solve integrated models, as (UHl-dlSSI, taking the 
planning decisions at all levels simultaneously, in order to optimize the quality 
of the production plans. 

5.2 Optimization of Multi-level Models 

We review briefly the complexity results known, and the optimization approaches 
used to solve various multi level planning problems. 

The uncapacitated problem (without constraint (Il57l) l with a serial product 
structure (see Figure |4) is the simplest problem in the class. It can be solved in 
polynomial time by dynamic programming (Zangwill |85j and Love [41j l using 
the same decomposition property as for ULS, and the technique of Martin j44] 
can be used to transform this dynamic program into a compact (i.e. polynomial 
number of variables and constraints, in the number of items and time periods) 
linear extended reformulation of the problem (see Pochet [52]). The solution of 
this problem using a branch and cut approach, and a partial description of the 
convex hull of solutions, is presented in Pochet 

The next problem studied in the literature is the uncapacitated problem with 
an assembly product structure (see Figure U). The computational complexity of 
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this problem is still an open question. Afentakis, Gavish and Karmarkar [T] have 
solved problems with up to 50 items using a Lagrangean relaxation approach 
based on the echelon stock reformulation and combined with a specialized branch 
and bound algorithm. 

The uncapacitated problem with general product structure has been tackled 
by Afentakis and Gavish [2j using a Lagrangean relaxation approach, and by 
Pochet and Wolsey [55] using a cut and branch approach (i.e. cuts are added 
only at the first node of the branch and bound tree, before any branching) . 

The full problem ( I154I1 - (I158I1 with general product structure, capacities and 
setup times, has been addressed by Tempelmeier and Derstroff |SS] to obtain 
heuristic solutions with performance guarantee using a Lagrangean relaxation 
approach, and by Stadtler m using a branch and bound approach and the 
extended linear reformulations for the ULS subproblems. 

All the optimization applications mentioned for assembly and general prod- 
uct structures, using Lagrangean relaxation, branch and bound or branch and 
cut approaches, are based on the same reformulation of the problem. First, prob- 
lem (I154p - (I158|I is reformulated using the echelon stock concept. This transforms 
the initial formulation into a series of ULS subproblems linked by capacity con- 
straints and by product structure constraints (see Section 15.81 below) . Then, the 
ULS part of the problem is reformulated in the branch and bound or branch 
and cut approaches, or the linking constraints are relaxed in the Lagrangean 
relaxation approach. 

Better relaxations are available, but not yet used for solving complex capac- 
itated multi-level problems. For instance, the reformulations and valid inequali- 
ties found for big buckets capacitated models with setup times could be used to 
tighten the formulations further. Valid inequalities for fixed charge network flow 
problems could be generated on the multi-level network flow problem (see Van 
Roy and Wolsey (74]). 

5.3 The Echelon Stock Reformulation 

The formulation (I154ll - (I158I) of the general multi-level planning problem does 
not contain explicitely the single item ULS subproblems (except for finished 
products) because of the presence of both dependent and independent demand 
for the items. By using the concept of echelon stock, introduced by Glark and 
Scarf one can obtain a reformulation that contains explicitely the ULS sub- 
problems. This allows one then to design decomposition optimization algorithms 
based on our knowledge on how to solve and reformulate ULS. 

For simplicity, we assume that the lead times 7 * are equal to zero in (nnsi. 
The echelon stock El of item i in period t is the total stock of component i in the 
production system, as item i or in items appearing further on in the production 
process. It can be defined as 

El = sl + Ej^Si^r^^Ei 



(159) 
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because, in addition to the stock sj of items i as items i, there are units of 
items i present in each unit of stock of item j. 




d 

t 



Fig. 14. A general product structure 



Figure IH] gives an example of general product structure. For this example, 
the echelon stocks of the different items are 



El = si (160) 

El = sl + 2El = si + 2sl (161) 

El = sl + 2El + El = si + 2sl + 5sl (162) 

Ef = sl + :iEl = sl + :isl + 6sl (163) 

El = si + 2El + 3El = si + 3s? + 8s? + 19s? (164) 

EI = si + Ef = si + si + 2s? + 5s? (165) 



We can use equation (I159|l to define as the number of items i present in 
each unit of item j, for all i, j (i.e. El = and to replace the natural 

stock variables s\ by the echelon stock variables E\ in formulation (fT54ll-(M. 
This gives the formulation 



El_-^ + I? = ^ + El for alH, t 

j 



( 166 ) 
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< Myl 


for all i, t 


(167) 


El - Y, > 0 


for all i, t 


(168) 










for all t,k 


(169) 


El>Q, 2/je{0,l} 


for all i,t 


(170) 



where (llOHfl is the product structure linking constraint coming from the non 
negativity of the natural stock variables, (ll09l) is the capacity linking constraint, 
dM}-(nnz) define independent ULS subproblems, one for each item. 



6 New Directions 

We conclude in this Section by giving a personal view on some new directions to 
be investigated in modeling production planning problems. These include better 
productivity models, and in particular better models for capacity utilization 
and setup times, new models to represent the product structure - or recipes - 
in process industries, and the study of continuous time planning and scheduling 
models as opposed to the discrete time models studied in this review. As pointed 
out in Kuik et al ESI, such extensions are crucial for the vitality of the production 
planning and lot-sizing research. 

We also define some challenges for the future of this research field. 



6.1 Setup Times, Capacity Utilization, and Productivity Models 

We have already mentioned in Section 14.21 that the optimal solutions of multi- 
item capacitated lot-sizing models (MICLS) with setup cost may well correspond 
to infeasible and/or suboptimal in reality. The main difficulty there was to esti- 
mate a priori the setup cost as the opportunity cost of the capacity used during 
setup. This was the main reason to study similar capacitated models, but with 
setup times instead of setup costs, using constraint IKUI to model capacity uti- 
lization. 

For such capacitated models with setup times, a first direction of research 
is to investigate the polyhedral structure of multi-item subproblems involving 
both capacity utilization and demand satisfaction, such as Miller jJS]- This will 
allow one to obtain new classes of valid inequalities and to solve problems using 
better relaxations than the one obtained from the reformulation of the simple 
uncapacitated subproblems. 

Another extension is to develop new capacity utilization or productivity mod- 
els, and to study the corresponding formulations. 
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— In classical MRP databases, the routing of an item is defined as a fixed 
sequence of processing or production steps, each performed in a specific 
workcenter (i.e. on a specific resource), requiring a fixed setup time and a 
variable processing time per unit produced. This capacity utilization model 
can be introduced in the capacitated multi-level lot-sizing formulation J154I) - 
(11581 using and to represent respectively the unit processing time 
and fixed setup time for item i, on resource k, in the processing period, 
for Z\ = 1 , • • • , 7 *. The capacity constraint I) 1 5711 becomes then 

t t 

^ ^ -h ^ ^ foralH,A: (171) 

i i=t-7* + l i (=*-7^ + 1 

— Product similarities or product families also have some influence on the setup 
and change over times. On a specific resource or machine, switching from 
one product family to another requires more time than switching from one 
product to another product of the same family. Van de Velde [70] suggested 
new capacity utilization models to better reflect such situations, see Simpson, 
Erenguc [64) . 

— In activity based costing and management systems (ABC/ ABM), hierarchi- 
cal models for activities and costs are used to better approximate the (pro- 
duction) cost structure. These can be modelled using joint or hierarchical 
setup cost models, as in Degraeve and Roodhooft m- 

6.2 Process Industry: Products and Operations Structure 

The multi-level MRP-II model, and its typical formulation (I154| |- (I158II . should be 
generalized in order to be applicable to other processes than discrete parts man- 
ufacturing. In particular, for chemical or process industries, production planning 
and sequencing problems require some generalization of BOM structures to al- 
low the modelling of OPERATIONS consuming items as INPUTS producing 
items as OUTPUTS in fixed or variable proportions and requiring some amount 
of RESOURCES. The main difference with respect to discrete BOMs is that 
several output items or co-products are generally produced simultaneously in a 
single operation, see Westenberger, and Kallrath EZS), Kondili, Pantelides and 
Sargent [S3]. 

The basic example -and formulation- of an operations model is 

4-1 + Pout = [d] + p7n x°] + s\ for alH,t (172) 

O O 

LB°y° <x°< UB°y° for all o,t (173) 
x°, 4 ^0, y° € {0,1} for all i,o,t (174) 

where the index o is used to model operations, x° represents the amount of time 
spent on performing operation o in period t, and y° = I indicates that operation 
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o is carried out in period t. The extended BOM structure is defined by Pout 
and pYi^ representing respectively the number of units of item i produced and 
consumed per time unit of operation o, for all operations o and items i. The data 
7 °* represents the lead time needed for operation o to produce the output item 
i. 



Many of the reformulation techniques presented in this paper can be used to 
tighten the formulation of this operations planning model. 

6.3 Continuous Time Scheduling Models 

In production planning, small buckets models are used when there is a need (cost, 
capacity, . . . ) to represent the detailed sequence and timing of events (such as 
production start, production end, empty or full storage tank, . . . ) through time. 
Very often in such models, the period time lengths are very small in order to 
represent these events at precise moments in time, but the number of events to 
occur in the planning horizon is small. To avoid to have too many time periods, 
and too large models, continuous time models are used. They use variable period 
lengths in order to represent the events in time, see Zhang and Sargent | 86| . Pinto 
and Grossman |^, Pochet, Tahmassebi, Wolsey [59] . 

Consider an example planning problem where a number of items - indexed 
by i or j - have to be produced on a set of independent parallel production 
lines - indexed by k. There is a demand of d* units of item i to meet, and the 
production rate of item i on machine k is units of i per unit production time. 
When there is a change over on machine k from item i to item j, units of 
time are lost, and units of time are available on line k during the planning 
horizon. 

A continuous time formulation for this model is 
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where variable is the start time of item i on line k, variable 6®^ is the end 
time of item i on line k, variable takes the value 1 if item i is produced on 
line k, and variable equals 1 if there is a change over from item i to item 
j on line k. Item 0 is a Active item used to start and finish the sequence on 
each machine. Constraint (11751) imposes demand satisfaction, constraint (11761) is 
the line capacity constraint, constraint (11771 ) imposes the starting time of item 
j when it follows directly item i on line k, constraint (11781 fixes the value of 
the setup variables, (rr7 ^ - (lT82|l model the sequencing constraints (i.e. each line 
must have a starting item, a finishing item, each item must have a predecessor, 
and a successor). 

6.4 Challenges 

As a conclusion, we summarize some key challenges for the future of this research 
field 

— The multi-item production planning model (I154I) - (I158II with general prod- 
uct structure, capacities and setup times must be solved in practice. This 
means that specific optimization codes have to be developed, that heuristic 
algorithms based on linear programming relaxations must be designed in 
order to produce good solutions for large instances, and fast and specialized 
algorithms for solving the large size LP relaxations have to be developed. 

— Other more general models have to be constructed in order to come closer 
to interesting and real planning problems. This includes new capacity uti- 
lization models, joint setup times and costs models, operations planning and 
scheduling models. 

— The models have to be changed when the limits of such deterministic discrete 
time planning problems have been reached. This includes the development 
and resolution of continuous time scheduling models, as well as of stochastic 
models. 

References 

1. P. Afentakis, B. Gavish and U. Karmarkar, “Computationally efficient optimal 
solutions to the lot-sizing problem in multistage assembly systems”. Manage- 
ment Science 30(2), 222-239, 1984. 

2. P. Afentakis and B. Gavish, “Optimal lot-sizing algorithms for complex product 
structures”. Operations Research 34, 237-249, 1986. 

3. A. Aggarwal and J. Park, “Improved algorithms for economic lot-size prob- 
lems”, Operations Research 41, 549-571, 1993. 

4. A. Agra and M. Constantino, “Lotsizing with backlogging and start-ups: the 
case of Wagner- Whitin costs”, Operations Research Letters 25 (2), 81-88, 1999. 

5. R.N. Anthony, “Planning and control systems: a framework for analysis”. Har- 
vard University Press, Cambridge, Mass., 1965. 

6. I. Barany, T.J. Van Roy and L.A. Wolsey, “Uncapacitated lot sizing: the convex 
hull of solutions”. Mathematical Programming Study 22, 32-43, 1984. 



108 



Y. Pochet 



7. I. Barany, T.J. Van Roy and L.A. Wolsey, “Strong formulations for multi-item 
capacitated lot-sizing”, Management Science 30, 1255-1261, 1984. 

8. C. Batta and J. Teghem, “Optimization of production scheduling in plastics 
processing industry”, Jorbel (Belgian Journal of operations research, statistics 
and computer science) 34 (2), 55-78, 1994. 

9. P.J. Billington, J.O. McClain, L.J. Thomas, “aMathematical Programming 
approaches to capacity constrained MRP systemsa: review, formulation and 
problem reductiona”. Management Science 29 (10), 1126-1141, 1983. 

10. W.-H. Chen and J.-M. Thizy, “Analysis of relaxations for the multi-item ca- 
pacitated lot-sizing problem”, Annals of Operations Research 26, 29-72, 1990. 

11. A.J. Clark and H. Scarf, “Optimal policies for multi echelon inventory prob- 
lems, Management Science 6, 475-490, 1960. 

12. M. Constantino, “A cutting plane approach to capacitated lot-sizing with start- 
up costs”, Mathematical Programming 75 (3), 353-376, 1996. 

13. M. Constantino, “Lower bounds in lot-sizing models: A polyhedral study”, 
Mathematics of Operations Research 23 (1), 101-118, 1998. 

14. W.B. Crowston, M.H. Wagner, “Dynamic lot size models for multi stage as- 
sembly systems”. Management Science 20(1), 14-21, 1973. 

15. W.B. Crowston, M.H. Wagner, J.F. Williams, “Economic lot size determination 
in multi stage assembly systems”. Management Science 19(5), 517-527, 1973. 

16. Z. Degraeve, F. Roodhooft, “Improving the efficiency of the purchasing process 
using total cost of ownership informationa: The case of heating electrodes at 
Cockerill-Sambre S.A.a”, European Journal of Operational Research 112, 42- 
53, 1999. 

17. M. Diaby, H.C. Bahl, M.H. Karwan and S. Zionts, “A Lagrangean relaxation 
approach to very large scale capacitated lot-sizing”. Management Science 38 
(9), 1329-1340, 1992. 

18. S.E. Elmaghraby, “The economic lot-scheduling problem (ELSP): reviews and 
extensions”. Management Science 24, 587-598, 1978. 

19. G.D. Eppen and R.K. Martin, Solving multi-item lot-sizing problems using 
variable definition. Operations Research 35, 832-848, 1987. 

20. A. Federgrun and M. Tsur, “A simple forward algorithm to solve general dy- 
namic lot-size models with n periods in Oinlogn) or 0{n) time”. Management 
Science 37, 909-925, 1991. 

21. B. Fleischmann, “The discrete lotsizing and scheduling problem”, European 
Journal of Operational Research 44(3), 337-348, 1990. 

22. B. Fleischmann, “The discrete lotsizing and scheduling problem with sequence- 
dependent setup costs”, European Journal of Operational Research, 1994. 

23. M. Florian and M. Klein, “Deterministic production planning with concave 
costs and capacity constraints”. Management Science 18, 12-20, 1971. 

24. M.X. Goemans, “Valid inequalities and separation for mixed 0-1 constraints 
with variable upper bounds”. Operations Research Letters 8, 315-322, 1989. 

25. M. Grotschel, L. Lovasz and A. Schrijver, “The ellipsoid method and its con- 
sequences in combinatorial optimization”, Gombinatorica 1, 169-197, 1981. 

26. O. Gunluk and Y. Pochet, “Mixing mixed integer rounding inequalities”, 
GORE discussion paper 9811, Universite catholique de Louvain, Belgium, 1998. 
(to appear in Mathematical Programming) 

27. Haase, “Lotsizing and scheduling for production planning”. Lecture notes in 
economics and mathematical systems 408, Springer, Berlin, 1994. 

28. F.W. Harris, “How many parts to make at once”. Factory, the Magazine of 
Management 10(2), 1913. 




Mathematical Programming Models 109 



29. A.C. Hax and H.C. Meal, “Hierarchical integration of production planning 
and scheduling” , in M. Geisler editor, TIMS studies in Management Science, 
chapter 1, North Holland/ American Elsevier, New York, 1975. 

30. S. Kang, K. Malik, L.J. Thomas, “Lotsizing and scheduling in parallel machines 
with sequence dependent setup costs” , Working paper 97-07, Johnson Graduate 
school of Management, Gornell university, 1997. 

31. U.S. Karmarkar and L. Schrage, “The deterministic dynamic product cycling 
problem”. Operations Research 33, 326-345, 1985. 

32. A. Kimms, “Multi-level lot sizing and scheduling: methods for capacitated, 
dynamic and deterministic models”, Physica-Verlag (production and logistics 
series), Heidelberg, 1997. 

33. E. Kondili, G.C. Pantelides, R.W.H. Sargent, “A general algorithm for short- 
term scheduling of batch operations - 1. MILP formulation” , Gomputers Ghem- 
ical Engineering 17, 211-227, 1993. 

34. J. Krarup and O. Bilde, “Plant location, set covering and economic lot sizes: 
an 0(mn) algorithm for structured problems”, in “Optimierung bei Graphen- 
theoretischen und Ganzzahligen Probleme”, L. Collatz et al. eds, Birkhauser 
Verlag, Basel, 155-180, 1977. 

35. R. Kuik, M. Salomon and L.N. van Wassenhove, “Batching decisions: structure 
and models”, European Journal of Operational Research 75, 243-263, 1994. 

36. L.S. Lasdon and R.G. Terjung, “An efficient algorithm for multi-item schedul- 
ing”, Operations Research 19, 946-969, 1971. 

37. J. Leung, T.M. Magnanti and R. Vachani, “Facets and algorithms for capaci- 
tated lot-sizing”. Mathematical Programming 45, 331-359, 1989. 

38. M. Loparic, Y.Pochet and L.A. Wolsey, “Uncapacitated lot-sizing with sales 
and safety stocks”. Mathematical Programming 89 (3), 487-504, 2001. 

39. M. Loparic, H. Marchand and L.A. Wolsey, “Dynamic knapsack sets and ca- 
pacitated lot-sizing”, GORE discussion paper 2000/47, Universite catholique 
de Louvain, Louvain-la-Neuve, Belgium, 2000. 

40. L. Lovasz, “Graph theory and integer programming” , Annals of Discrete Math- 
ematics 4, 141-158, 1979. 

41. S.F. Love, “A facilities in series inventory model with nested schedules”. Man- 
agement Science 18(5), 327-338, 1972. 

42. T.M. Magnanti and R. Vachani, “A strong cutting plane algorithm for pro- 
duction scheduling with changeover costs”. Operations Research 38, 456-473, 
1990. 

43. H. Marchand and L.A. Wolsey, “The 0-1 knapsack problem with a single con- 
tinuous variable”. Mathematical Programming 85, 15-33, 1999. 

44. R.K. Martin, “Generating alternative mixed-integer programming models us- 
ing variable redehnition” , Operations Research 35, 331-359, 1987. 

45. R.K. Martin, “Using separation algorithms to generate mixed integer model 
reformulations”. Operations Research Letters 10, 119-128, 1991. 

46. A.J. Miller, G.L. Nemhauser and M.W.P. Savelsbergh, “On the polyhedral 
structure of a multi-item production planning model with setup times” , GORE 
discussion paper 2000/52, Universite catholique de Louvain, Louvain-la-Neuve, 
Belgium, 2000. 

47. G.L. Nemhauser and L.A. Wolsey, “Integer and combinatorial optimization”, 
Wiley, New York, 1988. 

48. G.L. Nemhauser and L.A. Wolsey, “A recursive procedure for generating all 
cuts for 0-1 mixed integer programs”, Mathematical Programming 46, 379-390, 
1990. 




110 



Y. Pochet 



49. J. Orlicky, “Material Requirements planning”, McGraw-Hill, New York, 1975. 

50. M.W. Padberg, T.J. Van Roy and L.A. Wolsey, “Valid inequalities for fixed 
charge problems” , Operations Research 33, 842-861 , 1985. 

51. Pinto and Grossmann, “A logic-based approach to scheduling problems with 
resource constraints”, Gomputers and Ghemical Engineering 21 (8), 801-818, 
1997. 

52. Y. Pochet, “Lot-sizing problems : reformulations and cutting plane algo- 
rithms”, PhD Thesis, Universite Gatholique de Louvain, Belgium, 1987. 

53. Y. Pochet, “Valid inequalities and separation for capacitated economic lot- 
sizing”, Operations Research Letters 7, 109-116, 1988. 

54. Y. Pochet and L.A. Wolsey, “Lot-size models with backlogging: Strong formu- 
lations and cutting planes”, Mathematical Programming 40, 317-335, 1988. 

55. Y. Pochet and L.A. Wolsey, “Solving multi-item lot sizing problems using strong 
cutting planes”. Management Science 37, 53-67, 1991. 

56. Y. Pochet and L.A. Wolsey, “Lot-sizing with constant batches: Formulation and 
valid inequalities”, Mathematics of Operations Research 18, 767-785, 1993. 

57. Y. Pochet and L.A. Wolsey, “Polyhedra for lot-sizing with Wagner- Whitin 
costs” , Mathematical Programming 67, 297-323, 1994. 

58. Y. Pochet and L.A. Wolsey, “Algorithms and reformulations for lot-sizing prob- 
lems”, DIMAGS Series in Discrete Mathematics and Theoretical Gomputer 
Science, 20, 245-293, 1995. 

59. Y. Pochet, T. Tahmassebi and L.A. Wolsey, “Reformulation of a single stage 
packing model”. Report, Memips: Esprit project 20118, June 1996. 

60. R.L. Rardin and U. Ghoe, “Tighter relaxations of fixed charge network flow 
problems”, report J-79-18, Industrial and Systems Engineering, Georgia Insti- 
tute of Technology, Atlanta, Georgia, 1979. 

61. R. Rardin and L.A. Wolsey, “Valid inequalities and projecting the multicom- 
modity extended formulation for uncapacitated fixed charge network flow prob- 
lems”, European Journal of Operations Research, Nov 1993. 

62. M. Salomon, “Deterministic lotsizing models for production planning”, PhD. 
Thesis, Erasmus Universiteit Rotterdam, The Netherlands, 1990. 

63. M. Salomon, L.G. Kroon, R. Kuik, L.N. Van Wassenhove, “Some extensions 
of the discrete lotsizing and scheduling problem”. Management Science 37(7), 
801-812, 1991. 

64. N.C. Simpson, S.S. Erenguc, “Production planning in multiple stage manu- 
facturing environments with joint costs, limited resources nad set-up times”. 
Technical report. Department of Management Science and Systems, University 
of Buffalo, 1998. 

65. H. Stadtler, “Mixed integer programming model formulations for dynamic 
multi-item multi-level capacitated lotsizing”, European Journal of Operational 
Research 94 (3), 561-581, 1996. 

66. H. Tempelmeier and M. Derstroff, “A Lagrangean-based heuristic for dynamic 
multilevel multiitem constrained lotsizing with setup times” , Management Sci- 
ence 42 (5), 738-757, 1996. 

67. J.M. Thizy and L.N. Van Wassenhove “Lagrangean relaxation for the multi- 
item capacitated lot-sizing problem: a heuristic implementation”, HE Trans- 
actions 17 (4), 308-313, 1985. 

68. W. Trigeiro, L.J. Thomas and J.O. McGlain, “Gapacitated lot sizing with setup 
times”. Management Science 35(3), 353-366, 1989. 

69. F. Vanderbeck, “Lot-sizing with start up times”. Management Science 44 (10), 
1409-1425, 1998. 




Mathematical Programming Models 111 



70. W. Van de Velde, Private communication, 1997. 

71. C.P.M. van Hoesel, “Models and algorithms for single-item lot sizing prob- 
lems”, Ph.D. Thesis, Erasmus Universiteit, Rotterdam, 1991. 

72. S. van Hoesel, A. Wagelmans and L.A. Wolsey, “Economic lot-sizing with start- 
up costs: the convex hull”, SIAM Journal of Discrete Mathematics, 1994. 

73. S. van Hoesel and A. Wagelmans , “An 0(T-3) algorithm for the economic lot- 
sizing problem with constant capacities”, Management Science 42 (1), 142-150, 

1996. 

74. T.J. Van Roy and L.A. Wolsey, “Valid inequalities and separation for uncapac- 
itated fixed charge networks”, Operations Research Letters 4, 105-112,1985. 

75. A.F. Veinott, “Minimum concave cost solution of Leontief substitution models 
of multi-facility inventory systems”. Operations Research 17(2), 262-291, 1969. 

76. T.E. Vollman, W.L. Berry, and D.C. Whybark, “Manufacturing Planning and 
Control Systems”, Third Edition, Richard D. Irwin., 1997. 

77. A.P.M. Wagelmans, C.P.M. van Hoesel and A.W.J. Kolen, “Economic lot- 
sizing: an Oinlogn) algorithm that runs in linear time in the Wagner- Whitin 
case”. Operations Research 40, Supplement 1, 145-156, 1992. 

78. H.M. Wagner and T.M. Whitin, “Dynamic version of the economic lot size 
model”. Management Science 5, 89-96, 1958. 

79. H. Westenberger and J. Kallrath, “Formulation of a jobshop problem in process 
industry”, Preprint, Bayer, Leverkusen, January 1995. 

80. R.H. Wilson, “A scientific routine for stock control” , Harvard Business Review 
13, 1934. 

81. L.A. Wolsey, “Uncapacitated lot-sizing problems with start-up costs”. Opera- 
tions Research 37, 741-747, 1989. 

82. L.A. Wolsey, “MIP modelling of changeovers in production planning and 
scheduling problems”, European Journal of Operational Research 99, 154-165, 

1997. 

83. L.A. Wolsey, “Integer programming”, Wiley, New York, 1999. 

84. W.L Zangwill, “Minimum concave cost flows in certain networks” , Management 
Science 14, 429-450, 1968. 

85. W.L Zangwill, “A backlogging model and a multi-echelon model of a dynamic 
economic lot size production system - a network approach” , Management Sci- 
ence 15(9), 506-527, 1969. 

86. X. Zhang and R.W.H. Sargent, “A new unified formulation for process schedul- 
ing”, AIChE annual meeting. Paper 144c, St Louis, Missouri, 1993. 




Lagrangian Relaxation 



Claude Lemarechal 

Inria, 655 avenue de I’Europe, Montbonnot, 38334 Saint Ismier, France 
Claude . LemarechalSinr ialpes . f r 



Abstract. Lagrangian relaxation is a tool to find upper bounds on a 
given (arbitrary) maximization problem. Sometimes, the bound is exact 
and an optimal solution is found. Our aim in this paper is to review this 
technique, the theory behind it, its numerical aspects, its relation with 
other techniques such as column generation. 



1 Introduction, Motivation 

1.1 Scope of the Paper 

One of the basic techniques in combinatorial optimization is bounding: given a 
(finite but) nasty set S C M" and an “easy” function / (say linear or quadratic), 
find an upper bound for the optimal value of the problem 

maxf(x) , X € S . 

Lagrangian relaxation is a universal technique for that. To make it applicable, 
the feasible set S must be written as S' = ft 0 {a; : c(x) = 0}, where X is 
“easy” in terms of maximizing / over it, while the constraints c = (ci, . . . , Cm) 
are “complicating” . In fine, the technique produces a bunch of optimal solutions 
to the problem max„,gg /(a;), where S is a certain convex enlargement of S. 
For a general introduction and motivation of this technique, see for example 
123 Chap. XII]; or also p3], Pl Chap. 6] for its application to combinatorial 
problems. 

We will first describe in a what Lagrangian relaxation is. Then a list of 
possible applications (]J21) will illustrate its versatility. Section El will be devoted 
to theory; in particular we will study the relations between the original problem 
and its relaxed version, i.e. between S and S. The end of the paper will be 
devoted to numerical algorithms. 

Our development may be deemed too technical at places. In particular, we 
will not necessarily assume S to be finite. As a result, we will have to care about 
continuity and boundedness, while combinatorics belong to the polyhedral world, 
with everything contained in the unit cube. Indeed, there are several reasons for 
our attitude: 

(i) Perhaps paradoxically, it helps to think in abstract terms, forgetting as 
much as possible about linear algebra. The latter may make life easier, 
in particular skipping analysis and calculus; however this has its price: 
linear algebra is often “the tree hiding the forest” , which prevents a sound 
understanding of what one is doing. 
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(ii) SDP optimization, which has entered recently the combinatorial world (see 
[I] for example), is a form of Lagrangian relaxation; and of course, SDP 
optimization is definitely beyond the polyhedral world. 

(iii) Anyway, Lagrangian relaxation has a much larger field of applicability than 
combinatorial problems, and it does not harm to care about mathematical 
rigor. 

1.2 The Basic Idea of Lagrangian Relaxation 

The present section introduces the mechanism of Lagrangian relaxation as a sim- 
ple yet general tool. Consider first an optimization problem put in the abstract 
form 



max/(a;) , x € X , cj(x) = 0, j = 1, . . . ,m , (1) 

hereafter called the primal problem. We introduce the Lagrangian, a function of 
the primal variable x and of the dual variable u S R*": 

m 

X X R"* 9 {x, u) I— >■ L{x, u) := f{x) — UjCj{x) = f{x) — c{x) , (2) 

i=i 

where the last equality introduces the notation c for the m- vector of constraint- 
values. In plain words, the Lagrangian replaces each constraint by a linear “price” 
to be paid or received, according to the sign of Uj. Maximizing L(-,u) over the 
whole of X is therefore closely related to solving ([TJ. In a sense, X can be 
considered as the “universe” in which x lives, while c represents the constraints 
that are “relaxed”. 

Definition 1 The dual function associated with o, (El is the function of u 
defined by 



R"* 9 M H> 9{u) := max L{x, u) . 

The dual problem is 



( 3 ) 



min6»(u), uGR™. (4) 

□ 

Lagrangian relaxation is a very versatile technique, which accepts essentially 
any data X,f,c. The only really crucial assumption is the following, pragmatic 
but fundamental; it will be in force throughout: 

^ Notationally, we should write sup and inf throughout instead of min and max, since 
nothing guarantees that the various suprema are attained. Despite our care for math- 
ematical rigor, we will neglect this subtlety and use min and max uniformly. 

Note also that minimizing 6 excludes the value 0{u) = -too. As a result, the dual 
problem entails possible dual constraints, preventing the value -too in ®. 
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Assumption 2 Solving HI) is “dijficult”, while solving ((H) is “easy”. An oracle 
is available which solves © for given u G M"* . □ 

Lagrangian relaxation then tries to take advantage of this, and aims at finding 
an appropriate u. To explain why finding an appropriate u amounts to solving 
the dual problem dH, observe the following triviality: by definition of 0, 

6{u) ^ f{x) for all x feasible in and all u G , 

simply because c{x) = 0. This relation is known as weak duality, which gives 
two motivations for the dual problem: 

- Because each 9{u) bounds from above the optimal cost in ([T]), minimizing 9 
amounts to finding the best possible such bound - a very relevant idea when 
bounding is at stake. 

- The other motivation is less known, perhaps more important, and certainly 
more subtle. Suppose we do want to solve tn> with the help of ([HI). In other 
words, suppose we want a u such that produces some argmax of L{-,u) 
which also solves ©• In particular, this must be feasible in © and therefore 
satisfy 9{u) = f{xu). In view of weak duality, we see that the only chance for 
our u is to minimize 0. 

Remark 3 Suppose a problem with inequality constraints: 

max/(x) , X G X , Cj{x) ^0, j = 1, . . . ,m . 

To get the form ([TJ, introduce slack variables s and use the flexibility of La- 
grangian relaxation: set X' := X x R™, so that we have to dualize 

max f{x) , X G X,s^ Q G R™ , c{x) + s = 0 G R™ . 

The Lagrangian is 

L'{x, s, u) = f{x) — {c{x) + s) = L{x, u) — s 

where L(x,u) is the “ordinary” Lagrangian (as though we had equalities). The 
resulting dual function 9' is clearly 

, N _ J +00 if some uj < 0 , 

\ 6'(u) otherwise, 

where 9 is the “ordinary” dual function. Ln a word, the dual problem becomes 
rninu^o 9{u) : the nonpositive part of the dual space is chopped off, we have an 
immediate illustration of noteQ] v. \113\ □ 

To bypass the above reasoning, it suffices to memorize the following rule: 

- consider a maximization problem 

- and dualize a constraint c by subtracting the term uc{x)~, 

- if the constraint was of the lower than type. 
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- then the corresponding multiplier u must be nonnegative 

- (one may prefer a minimization problem, with a + sign in the Lagrangian, and 
^-type constraints; the multipliers must again be nonnegative; memorizing the 
appropriate rule is a matter of taste). 

Along the lines of Remarkj^ suppose that one insists on having / linear in 
(HJ: adopt the formulation in the new universe A x R 



A first possibility is to insert the constraint sq ^ the universe: we form 

the Lagrangian so — u^c(x), whose maximum over x € At and sq ^ f(x) is clearly 
0(u); nothing is changed. 

Alternatively, we can dualize that constraint, forming the Lagrangian 

A X R X R'"+^ 9 (x, so,u,uo) L'(x, so,u,uo) := sq — uo[so — f{^)] ~ c{x) . 

Maximizing it with respect to sq (unconstrained!) gives +oo if uq ^ 1; and for 
uq = 1, its maximum with respect to a; G A gives 0(u); again nothing is changed. 

Remark 4 As a result, there is no loss of generality in assuming f linear in 0. 
In ease the given objective is really nonlinear, just use the formulation deemed 
the most convenient (with or without sq); they all give the same dual anyway. 



2 Examples 

The examples of this section are intended to familiarize the reader with the 
duality mechanism, but also to show its versatility. Indeed, the simple principles 
of 91 .21 a, hove cover a wide range of techniques; see also [6^ Chap. 4] for a number 
of applications issued from operations research. 

2.1 Linear Programming 

Take a linear program in standard form: 



The dual problem is therefore to minimize aJ u, subject to AJ u ^ h. 

Now set A := R" and dualize all constraints: Ax — a = 0 (dual variable u G 
R™) and —x ^ 0 (dual variable v G R"). We obtain the Lagrangian L'(x, u, v) = 



max So ) So ^ f{x ) , x G X , c{x) = 0 G R™ . 



□ 



max X , x ^ 0 G R" , Ax = a G R™ . 



Set first A := R", c{x) := Ax — a: with u G R™, we obtain 
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{h — AJ u + v)^ x-\-aJ u, to be minimized over x unconstrained. The dual function 
is therefore 

. { aJ u if 6 — + u = 0 , 

^(“>^) = |+oo otherwise. 

Remembering that we also have u ^ 0, either from Remark^or from the rule 
following it, the dual problem is to minimize aJu subject to b — A^u = —v ^ 0, 
which is exactly as before. 

We leave it as an exercise to dualize in various ways various forms of LP, and 
to realize in each case that the selected duality scheme has little influence on the 
resulting dual problem. For example, any conceivable duality scheme for 

max6^a:, x^O, —Se^Ax — a^Se (5) 

(e S K™ is the vector of all ones, <5^0) eventually boils down to 

m 

min u + 6 \uj \ , AJ u ^ a ; 
i=i 

the simplest scheme is to set Ax — a = s and to dualize just that constraint. 

2.2 Quadratic Programming 

An example hardly more complicated is 

maxb^ X — ^x^ Qx , Ax^a, 

where Q is a symmetric matrix. Take u ^ 0 and form the Lagrangian 
L{u, x) = {b — A^u)^x — ^x^Qx + aJu . 

To maximize it with respect to x unconstrained, one must consider three cases: 

- If Q is positive definite, there is a unique maximum x„ = Q~^{b — A^u); it 
produces the quadratic dual function 

9{u) = AQ~^A^u + (a — Ab)^u + ^b^ Q~^b , 

which is nicely convex and must be minimized over u ^ 0. 

- If Q is positive semideflnite, the Lagrangian has a finite maximum only if 
b — A^u lies in the range of Q (so that the equation Qx = b — A^u has a 
solution). This results in extra linear equality constraints in the dual problem. 

- If Q is indefinite, the “maximum” is always at infinity: 9(u) = +oo for all u; 
the dual problem produces no better upper bound than +oo. 

This example reveals a situation (the case of Q indefinite, or even Q ^ 0) 
where Lagrangian relaxation is of little help for finding upper bounds. Of course, 
the trouble here comes from a highly nonconvex primal problem; this will be seen 
in more detail in m 
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2.3 Max-cut, Max-stable, and Quadratic Constraints 

Here we apply Lagrangian relaxation to two special combinatorial optimization 
problems. Both are particular instances of the general class m below, which also 
lends itself to the approach. 

The max-cut problem can be written 

n 

min QijXiXj = x"^ Qx , = 1, t = 1, . . . , n . (6) 

* j=i 

Needless to say, the constraints xf = 1 just express that Xi = ±1. The symmetric 
matrix Q has nonnegative coefficients, but this is irrelevant for our purpose. 

The above problem has the form with X := R" and Cj{x) := xj — 1. 
Form the Lagrangian 

n 

L{x, u) = x^Qx — ^ Ui{xf — 1) = x^{Q — D{u))x + e^u ; (7) 

i=l 

here D(u) denotes the diagonal matrix constructed from the vector u, e is the 
vector of all ones. Minimizing L with respect to x (on the whole of R"!) is a trivial 
operation: if Q — D{u) is not positive semidefinite, we obtain — oo; otherwise, 
the best to do is to take x = 0. Now is a minimization problem, for which 
the dual function provides lower bounds; this dual function must be maximized. 
In a word, the dual problem associated with is 

maxe^u , subject to Q — D{u) ^ 0 . (8) 

An interesting point, to be seen in (d below, is that the optimal value of 
this SDP (semi-definite programming) problem is just the SDP bound of [20]. This 
gives an incentive to dualize more general problems with quadratic constraints, 
such as 



J maxx^Qo^: + 26q^x -I- Cq , 

\ x^ QjX + 2b X + Cj = 0 , j = 1, . . . , m . 



(9) 



The Lagrangian is the quadratic function L{x, u) = x^Q(u)x + 2b(u)^x + c(tt), 
where Q{u) := Qo — '^jQj likewise for b{u), c{u). Maximizing L{-,u) 
is slightly more complicated than for maxcut, due to the linear term 26(u)^x; 
but the idea is just the same and gives rise again to SDP programming. In fact it 
can be shown that the dual of (|9|) is the problem with variables (u, r) G R™ x R 



minx , 



c(u) — r h{u)^ \ 
b{u) Q{u) ) 



^ 0 . 



( 10 ) 



Such quadratic duality goes far back, at least to m, continuing with m, and 
more recently [38] . 
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This duality tool, available for general quadratically constrained problems, 
can be appplied to other combinatorial problems fitting in the same framework. 
One is the maximum stable set problem, which can be formulated as: 

maxw^x , XiXj = 0, {i,j) € A, x'^ = Xi, i = 1, . . . ,n 

{A C {I,... ,n}^ being the set of arcs in a graph). Just as for maxcut, the 
bound obtained by dualizing the quadratic constraints is a known one: Lovasz’ 
'd number m- For a systematic use of relaxation with quadratic constraints in 
combinatorial optimization, see [60138] . 



2.4 Conic Duality 

RemarkjSjis just a particular case of the following variant of ©: 

max/(a:), a; G , c{x) € K , (11) 

where K is a given set in R"*. Here we assume that K is a closed convex cone 
(but see Remark[5| below). To introduce a dual problem, use again slack variables 
to put m in the form 0: 

max f{x ) , {x,s) G X X K , c{x) — s = 0 G . 

Again with u G R.™, we form the Lagrangian L'{x, s, u) = L{x, u) + s, where 
L{x,u) = f{x) — vJc{x) as in ([2|). Then the dual function is 

9'(u)= max L' (x. s.u) = 6(u) + mayixi^ s , (12) 

(x,s)ax^K seK 



with 9 of (0. The term maxu^s is clearly 0 for u in the so-called polar cone of 
K: 



K° := {u G R"* : vJ s ^ 0 for all s G K} , (13) 

and -koo elsewhere: our dual problem is to minimize 9(u) over u G K°. This is 
the whole business of conic duality, as developed for example in [m Chap. 8] 
or PI Chap. 6]; see also [251 §XII.5.3(a)]. It contains among other things SDP 
duality, as developed for example in HEg. 

Remark 5 One may ask why K should be a closed convex cone. Actually, (II 2ll 
gives the fully general dual function 9' = 9 + gk, where axiu) := sup^g^u^s 
is the so-called support function of K (a fundamental object of convex analy- 
sis). Arbitrary K’s are therefore allowed. However, a support function does not 
distinguish between a set and its closed convex hull, a consequence of which is: 

- there is no loss of generality in assuming K closed convex; 

- said otherwise: the dual problem will not change if K is replaced by its closed 
convex hull. 
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Needless to say, just set K = {0} in (fTTI) to obtain (|TJ; the set {0}° of feasible 
u’s is then the whole space (alternatively, observe that (T{o} =0/ In the case of 
Remarl^ the polar of the nonnegative orthant is the nonpositive orthant. In 
fact, the support function of a cone K is easily computed: it is 0 on K° and +c» 
elsewhere - hence the dual constraint u G K° in this case. 

As an illustration of this remark, we leave it as an exercise to dualize the 
following variant of o - or of (lilt with K = [—6,6]^, see also (|5|; 

max/(x), xGX, ||c(a::)||oo ^ ^ ■ □ 



As a conclusion, we reproduce the rule following Remark^] 

- consider a maximization problem 

- and dualize a constraint by subtracting the linear term uc(x); 

- if the constraint was c(x) € K, a cone, 

- then the corresponding multiplier u is constrained to vary in K° of (lldl) 

- (or, for a general K, add ax to the dual function). 

Take (E| as a particular case: the constraint- value c'(u) := Q—D(u) lies in the 
space of symmetric matrices (we use primes to recall that ((HI) is already a dual 
problem; its dual will be a “bidual” of ®). Symmetric matrices form a finite- 
dimensional vector space, which can be made Euclidean with the natural scalar 
product (M, N) := ■ MijNij. Introducing a (bi)dual variable - a symmetric 

matrix, call it A - we form the (bi)Lagrangian 

n 

L'{u, X) = e^u -{X,Q- D{u)) = e^u + J2 ~ > 



to be maximized over u G This gives the (bi)dual function 



9'{X) 



-(Q, X) if Aii = 1 for i = 1, . . . , n , 
-|-oo otherwise . 



Now K is here the closed convex cone of positive semidefinite matrices. It can 
be shown {^7\ Cor. 7.5.4] for example) that its polar is the (closed convex) cone 
of negative semidefinite matrices, which imposes the constaint A ^ 0. Finally, 
change A to —A to make the (bi)dual nicer: we obtain 

min {Q, X ) , A ^ 0 , Xu = 1, i = 1, . . . ,n . (14) 

This is the SDP relaxation of m- Note that a positive semidefinite matrix can 
be put in the form A = X)fc=i , where r is the rank of A. Observing that 
(Q,xx^) = x^Qx and that {xx^)u = xf, we see that ([HD is just (dD, but with 
A constrained to have the form A = xx^ , i.e. to be of rank 1. 



2.5 A Large-Scale Problem: Unit- Commit merit 

Perhaps the most obvious usage of Lagrangian relaxation is when ((T|) has many 
variables, which would become independent if the constraints c were not present. 
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For an illustration consider the so-called unit-commitment problem, which con- 
sists in optimizing the production planning of a set / of power plants, over some 
time horizon T. A possible formulation is as follows (see |4^ and the references 
therein): 

- the control variables are xl for i G I and t = 1, . . . ,T, they denote the pro- 
duction level of unit i at time t; Xi [resp. will stand for the vector (xl)f^i 
[resp. {xD^ei]-, 

- each unit i can operate within a certain feasible set representing the operat- 
ing dynamic constraints (for a nuclear plant, is a finite set of possible plan- 
nings: x\ may take a few possible values, say x\ S {500, 1000, 1500, 2000} MW; 
when the production has reached a certain level, it must stay there for a certain 
number of time periods, etc.); 

- at each time t, the demand must be satisfied; these are the so-called static 
constraints, denoted by c*{x*) < 0; usually, the functions c‘ are additive, say 
c‘(a:*) = cl(xl), or even affine; 

- finally, the cost to produce Xi by unit i over the whole time horizon is Ci(xi). 
Then the problem is 



min^ (7i(a::j) , 
iei 

Xi € A^2 , 2 G F , 

c*(x‘)<0, t=l,2, 

Insofar as the static constraints are additive, dualizing them produces an 
additive Lagrangian: with u = 

T T 

L{x,u) = '^Ci{xi) + '^u^c\x^) = ^c,(a::*) -b^u‘^c*(a;‘) . 

t—1 t—1 

After some reordering, we see that L is a sum of “local” Lagrangians, depending 
only on Xi'. minimizing L{-,u) over X '=Yl^i gives one problem per plant: 




T 

min Ci{xi) + u^c^ix ]) , 

t=i 

which is a lot easier than (ITSt (to fix ideas, the French production set has some 
150 plants working each day; optimizing each of them separately is a reasonable 
task, while optimizing them altogether is unconceivable). 

Remark 6 The above example illustrates perfectly the usual situation in La- 
grangian relaxation: the only available information concerning © - here (HSI ~ 
is the oracle which, given u, maximizes the Lagrangian L{-,u); and this oracle 
may be fairly complex. Here it is itself composed of some 150 “local oracles”, all 
probably very different (nuclear, gas, hydraulic, ■■■ ) and probably very sophis- 
ticated, at least for laymen in electrical engineering. □ 
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2.6 Column Generation in Linear Programming 

Consider the following type of problem. We have a very long list of points 
. . . , >foc in M", where oc is some enormous number, possibly +oo. We want 
to find a yc^ maximizing a linear function, say Xk, and satisfying affine con- 
straints, say Axk = a G K"*: we write the problem as 

max {b^ Xk : Axk = a,k = 1, . . . , oc} . (16) 

Assumption 7 Column generation requires the following conditions: 

(i) The number m of constraints is reasonable (as opposed to the cardinality 
oc of the list). 

(ii) Rather than solving the initial problem, one accepts to relax it, replacing 
the set {xi , . . . , x^cj by its convex hull. 

(Hi) R is “easy” to maximize linear functions over the Xk’s, i.e. to solve the 
unconstrained version of (fTBll .■ max : k = 1, . . . , oc}, for given c G K" 

(possibly different from b). □ 

In view of (ii), one is interested by upper bounds, as in Lagrangian relaxation. 
As for (iii), it says that one can compute the support function (see RemarkEJ 
of the XkS - or of their convex hull, which is the same function. In other words, 
the constraints Ax^ = a are complicating. Clearly the situation resembles (|T|) . 

We refer to 173 Chap. 11] and the references therein for the many appli- 
cations of column generation. Let us just mention: cutting stock, where each 
Xk represents a cut pattern; network optimization, a path from a source to a 
destination; facility location, a set of clients assigned to a depot; etc. 

Remark 8 Actually, Assumptiorf^ is just what combinatorial optimization is 
all about, each Xk representing one among many possible candidates to solving a 
certain problem. The most naive instance is the standard 0-1 linear programming 
problem 

m&xb^ X , a; G {0, 1}" , Ax = a . (17) 

After all, it can be formulated with notation of the present section: we want to 
find a 0-1 vector x G K" (there are oc= 2" of them) maximizing a linear func- 
tion and satisfying affine constraints. Barring these last constraints, the problem 
would become trivial. 

Admittedly, column generation may not be deemed most appropriate for the 
present model, though. R is more constructive to distinguish two groups of con- 
straints in (HZ); we write 

max b^ X , a; G {0, 1}” , Dx = d , Ax = a , 

the Xfc ’s being now the 0-1 points satisfying Dx = d. The approach is still feasible 
if, among other things, D is structured enough, so that it is easy to maximize a 
linear function c^x on 0-1 vectors satisfying Dx = d. Dispatching a given set of 
constraints among the above A and D belongs to the art of 0-1 programming. □ 
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Let us formulate © more compactly as 

maxb^ X , xGX, Ax = a. (18) 

This has the advantage of accepting oc = +oo; but more importantly, the con- 
nection with o becomes more blatant: the universe X is made of the or, in 
view of Assumption|7lii), of their convex hull. The objective is linear and the con- 
straints dualized in (|2|) are affine. In view of Assumption|7liii), the Lagrangian 
problem (0, which is here b^x—vJ{Ax—a), is “easy”: Assumption|7]|iii) 

is nothing other than Assumption[2l 

Now column generation is a special technique to solve dH, in which X is 
iteratively replaced by smaller sets Xk C X: CHI) is replaced by the master 
program 

maxb^ X , x G Xx , Ax = a . (19) 

A bounded polyhedron is taken for Xk, so that ifTm is an ordinary linear pro- 
gram. Also, the extreme points of Xk are extracted from the Xk’s in ifTHli : but 
this is of no importance for the moment anyway: what is interesting is how Xk 
is updated for the next iteration. 

To obtain Xk+i in column generation, one gets from the resolution of (UHl an 
m- vector uk (the multipliers associated with Ax = a), and one solves the satel- 
lite program', one finds the most positive among the numbers (6 — A^ uk)^ X k, 
k = 1,... , oc, which, in view of Assumption[2Kiii) , is an “easy” problem. Us- 
ing notation as in (HD and neglecting the constant term aA uk, this problem 
amounts to solving 

max (6 — A^uk)^ X uk = x — itJ-(Ax — a) = L{x, uk) ■ 

We see that the satellite does nothing other than computing the dual function 
9{uk) as in (0). Let us summarize these observations: 

Fact 9 Insofar as |T|) has linear data (X polyhedral, f linear, g affine), column 
generation and Lagrangian relaxation do the same thing with the same tool, 
starting from opposite premises: 

- In Lagrangian relaxation, the constraints c{x) = 0 are relaxed and the universe 
X is kept as “hard”; the resulting problem is solved in the oracle (EJ. 

- Column generation works the other way round: the constraints c{x) = 0 are 

kept as “hard” , while the universe is restricted to a smaller set; the resulting 
problem is solved in the master (fTiHl . □ 

Thus, any formulation of a problem in terms of column generation can be 
done in terms of Lagrangian relaxation (and conversely). We believe that this 
is important because Lagrangian relaxation - duality theory - mostly calls for 
fairly simple concepts, to be seen in (J3] By contrast, column generation has to 
call for the often fussy language of linear programming. In particular, it cannot 
be accounted for directly on the compact formulation (II9II . Remember our point 
(i) in gm 
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Anyway Fact|9| has several interesting consequences: 

- Column generation can be applied to more general problems than (Util) . In fact, 
the resulting methodology is often called generalized linear programming, as is 
lucidly explained in m- 

- Because column generation actually convexifies X in (11811 . Lagrangian relax- 
ation should do the same. This will be confirmed in mi below; see more 
precisely Theorem llTl Fact ITS] Example 1221 

- In Lagrangian relaxation, the dual function 0 of m must be minimized. Now 
we have seen that, in the column generation language, 9{uk) = s -I- 
where s is the output from the satellite. The question at stake in column 
generation is therefore to find multipliers uk such that s -I- aJ uk ~ i-e. 9 - is 
minimal. 

- In other words, column generation is by necessity a minimization mechanism. 
The master m must provide a possible algorithm to minimize 9] this will be 
confirmed in ( j4.2l be1ow. 

- Alternatively, any algorithm to minimize 9 must provide an alternative to the 
master (IT9l) for column generation; this will be confirmed in (14.3l and l5.2l below: 
see more precisely Rema,rk l,33l 

2.7 Entropy Maximization 

Even though the following example is definitely out of the combinatorial world, 
we mention it to emphasize once more the versatility of Lagrangian relaxation. 
It is indeed an optimization technique that should always be kept in mind. 

Entropy maximization appears all the time when an unknown function x(t), 
t £ [0, 1] must be identified with the help of some measurements. Then one seeks 
the “most probable” x{-) compatible with these measurements. Here is a typical 
example, where the measurements are some Fourier coefficients of x\ 

min / x{t) log x{t)dt , / cos {2jTrt)x{t)<lt = Zj , j = 1, .. . ,m . 

Jo Jo 

This problem has infinitely many variables but finitely many constraints. Chang- 
ing signs (we have a minimization problem) and forgetting the term — UjZj 
(constant in x), the Lagrangian L{x, u) is the integral over [0, 1] of the function 

m 

(,{x{t),u) := x{t) loga;(t) -I- ^ uj cos {2jTrt)x{t) = x{t) loga:(t) -I- gu{t)x{t) . 

i=i 

Skipping infinite-dimensional technicalities, we can admit the (true) fact that 
minimizing L(-, u) is achieved by minimizing i(-, u) for each t. This last problem 
is straightforward indeed: differentiate £ with respect to its first argument and 
solve for x{t). We obtain 1 -|- logx(t) -I- gu{t) = 0, which has the unique solution 

g-i-9^(t) = exp(-l - '^3 cos {2jTTt)) =: Xu{t) . 

Plugging this value (always positive!) in the expression of i gives an explicit 
formula for the dual function. See [2] for more details and references. 
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3 Minimal Amount of Convex Analysis 

As seen in Lagrangian relaxation just replaces a primal problem CD by its 
dual 0-0- We will first give the main properties of the dual problem. Then 
we will consider questions posed in the primal when the dual problem is solved: 
how good is the upper bound? how about recovering a primal optimal solution? 
Some familiarity with elementary convex analysis is required here. For this, |2tij 
may be useful. 

3.1 Minimizing the Dual Fhnction 

In this section, we look at CD-0 with dual glasses, concentrating on the space 
R™ of dual variables, and somehow forgetting the primal space M". Here is the 
fundamental result, whose proof is immediate: 

Theorem 10 Whatever the data X,f,c in © ean be, the funetion 9 is always 
eonvex and lower semicontinuous. 

Besides, if u is sueh that 0 has an optimal solution (not necessarily 
unique), then Qu '■= — c(x„) is a subgradient of 0 at u: 



Some comments are useful for a good understanding of this result. 

- An important object in convex analysis is the epigraph of a function 9, denoted 
by epi 9: this is the set of {u, r) € x R such that 0(u) ^ r. It is not difficult 
to realize that convex functions are exactly those whose epigraph is a convex 
subset of R™ X R. 

- Lower semicontinuity means: for any u G R™, the smallest cluster value of 0(u) 
when u — >■ u is at least 9(u): liminf„_,.,i 0(u) ^ 9{u). Alternatively, a function 
is lower semicontinuous when its epigraph is a closed subset of R™ x R. 

- Lower semi-continuity is an important property for minimization. It guarantees 
existence of a minimum point as soon as 9 “increases at infinity”. By contrast, 
a non-lower semicontinuous function would be for example 



Minimizing such a function (over it ^ 0) is meaningless; said otherwise: this 
function has no minimum point. 

- In our L — >■ 0 context, it helps intuition to think of A as a sort of index- 
set: instead of 9{u) = max„.^x f{x) — u^c{x), a writing more suggestive for 
our purpose would be: 9{u) = rnax^gj^: fk — vtJ Ck (it goes with the column 
generation of ^2.bl where K. = {1,2, . . . ,oc}). Thus the Lagrangian defines a 
family of functions L which are affine in u. 



9{v) ^ 9{u) + g^iv — It) for any v G R"* , 



( 20 ) 



which is written as gu G d9{u). 



□ 
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- Now observe that taking the supremum of a family of functions corresponds 
to intersecting their epigraphs. 

- Then convexity and lower semicontinuity of 9 become easy to accept: in fact, 
the epigraph of each function m H> L is a closed convex set (a so-called half- 
space) . Their intersection epi 9 is again closed and convex. 

- As for the subgradient property (I20II . it is straightforward: by definition of 9 
and looking at ((21), we have 

9{v) ^ L{x, v) = L{x, u) — c{x)^ {v — u) 



for all a; G A, including for x = Xu- 

- Observe that —c{u) is the vector of partial derivatives of L with respect to 
u. The subgradient relation gives a rigorous meaning to the following very 
informal statement: “the (total) derivatives of 9 are the partial derivatives of 
L with respect to m” . 

Such a statement can be justified in very special situations: suppose that X 
is the space M", that L is a smooth function of x and u, and that m has a 
unique solution which varies smoothly with u. Then write 

d9 dL dL dxu 

+ -(x„,„)—; 

but because Xu is a maximum point, the partials of L with respect to x vanish. 
This explains that, to differentiate 9, it suffices to differentiate L with respect 
to u. 

An important consequence of TheoremUniis that the dual problem is therefore 
always “easy”, insofar as is easy to solve. First, it can be qualified as well- 
posed, in the sense that minimizing a convex function is well-posed; and this 
independently of m, be it NP-hard or anything else. Second, under the sole 
condition that we are able to maximize the Lagrangian, we obtain “for free” 
the value 9{u) of the function, and the value —c(xu) of a subgradient; both 
informations are important to solve the dual problem. 

As a result, solving a dual problem is exactly equivalent to minimizing a con- 
vex function with the help of the oracle ((3D providing function- and subgradient- 
values. Sections(3(and(3] will be devoted to this problem, from a numerical point 
of view. 



3.2 Primal-Dual Relations 

A natural question is now: the dual problem is easy, but what is it good for, in 
terms of the primal? We already know that it provides upper bounds but can 
we say more? and when can we obtain a primal optimal solution once the dual 
is solved? 



Everett’s Theorem. Let us start with a very elementary observation. Suppose 
we have maximized L(-,w) for a certain u, and thus obtained an optimal cc„. 
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together with its constraint-value c(x„) S M™; set := —c(xu) and take an 
arbitrary x G Af such that c(x) = —gu- By definition, L(x,u) ^ L{xu,u), which 
can be written 



f{x) < f{xu) - u^c{xu) + u^c{x) = /(x„) . (21) 

This is a useful result for approximate resolutions of 

Theorem 11 (Everett) With the notation above, x^ solves the following per- 
turbation of 0- 

max/(x) , X G X , c{x) = —gu ■ □ 

The above “a posteriori” perturbed problem consists in replacing the right- 
hand side of the constraints in m by the vector gu- If this vector is small, Xu 
can be viewed as approximately optimal; if gu = 0, we are done: Xu is optimal. 
More can actually be said in Everett’s Theorem. 

- First, suppose Xu is only an e-maximizer of the Lagrangian; in other words: 
L{xu, u) ^ L{x, u)—£ for all x G X. Then (1^ becomes f{x) ^ /(a:„) -I- e, and 
Xu becomes an e-optimal solution of the perturbed problem. 

- Second observe that, to obtain the desired property f{x) ^ f{Xu) in (ISTj . it 

suffices to have vJ{c{x) — c{xu)) ^ 0. As a result, solves the problem 

maxf{x) , X G X , c{x) ^ —vJgu ■ 



- Even more: Xu solves any other problem having more constraints than above, 
providing these constraints are satisfied by Xu- Thus, consider the particular 
case of Remark[3| it is clear that Xu solves the perturbed problem 



max/(a:) , 



X G X , 



J ma,x{0,Cj{xu)} if Uj = 0 , 

\cj(a:u) if > 0 , 



in which optimality conditions appear explicitly: if Xu is such that Cj{xu) ^ 0 
{xu feasible) with equality whenever Uj > 0 (complementarity slackness), then 
Xu solves (HD- 



Dual Subdifferentials. The above considerations go back to [T^. Here we 
focus our attention to a differential study of the dual function. 

In view of Theorem ll 01 the partial gradient V„L(a;„,u) = —c{xu) of L with 
respect to u lies in d9(u). A converse to this property turns out to be crucial for 
primal-dual relations. Introduce the notation 

X{u) \= {x G X \ L{x,u) = 9{u)} , , . 

G{u) := {g = -c{x) : x G X{u)} ^ ’ 

for the set of optimal solutions in m and its image through V„L. Because a 
subdifferential is always a closed convex set, we deduce that the closed convex 
hull of G{u) is entirely contained in d9(u). An important question is whether 
89 {u) is thus entirely described; besides, getting rid of the (trouble-making) 
closure operation is desirable. This explains the following concept: 
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Definition 12 (Filling Property) The filling property for is said to 

hold at u & R.™ if dO{u) is the convex hull of the set G{u) defined in (1221) . □ 

A first question is then: when does the filling property hold? This is actually 
fundamental in subdifferential calculus: when is it possible to characterize the 
subdifferential of a sup-function? The question is extremely technical and will 
be skipped in this paper, as it has little to do with our development. To get a 
positive answer requires additional assumptions, of a topological nature. Let us 
enumerate some favourable situations: 

Theorem 13 The filling property of Definition ll‘A holds at any u G K™ 

- when X is a compact set on which f and each Cj are continuous; 

- in particular, when X is a finite set (usual Lagrangian relaxation of combina- 
torial problems); 

- in linear programming of ^2.1\ and in quadratic programming of ^2.2\ 

- more generally, in problems where f and the cj are quadratic (^2J^) . or even 

Ip-norms, 1 ^ p ^ -|-oo; see mm - □ 

The interested reader may consult |M] for the most recent results along these 
lines. When the filling property holds at it, then any anti-subgradient of 0 at it is 
a convex combination of constraint-values at sufficiently many x’s in X{u); say 

g e d9{u) ^ ^ “ X! ■ 

k 

Now a It minimizing 9 is characterized by the property 0 G dO(u), which means: 
there exists sufficiently many x’s in X(u) such that the c(x)’s have 0 in their 
convex hull: 

Proposition 14 Let it* solve the dual problem m and suppose the filling prop- 
erty holds at It*. Then there are (at most m-\-l) points Xk and convex multipliers 
ak such that 



L{xk,u*) = 9{u*) for each k, and E <^kc{xk) = 0 . □ 

k 

With these Xk’s and o^’s, it is tempting to make up the primal point 

x*:=^afcXfe, (23) 

k 

which definitely deserves attentioiJl. In fact, the following reasoning reveals con- 
ditions under which this x* does solve (P: 

^ Of course, the writing l|23( implies that X enjoys some structure, namely that its 
elements can be averaged. Besides, observe the connotation of column generation: as 
a convex combination of primal points computed by the satellite, x* could very well 
solve (m with an appropriate Xk- 
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- If A” is a convex set (say in R") then x* G X. Appropriate use of Everett’s 
Theorem llll tells us to what extent x* is approximately optimal in tID- 

- If, in addition, c is an affine mapping (from K" to K™) then 

c(x*) = '^akc(xk) = 0. 

k 

Thus X* is feasible in Q. Besides L{x*,u*) = f{x*) — (u*)^c{x*) = f{x*), so 
the number 9{u*) — f{x*) (nonnegative because of weak duality) again tells us 
to what extent x* is aproximately optimal. This number, the so-called duality 
gap, will be studied in more detail below. 

- If, in addition, L{-,u*) is concave (as a function defined on K") then 

f{x*) = L{x*,u*) ^ ^ akL{xk, u*) = 6{u *) . 

k 

In view of weak duality, x* is optimal in 0 . 

Remark 15 The above reasoning can he reproduced in the case of conic duality 
of knowing that the dual optimality condition is no longer 0 S dd{u*) but 
rather: there is g G dd(u*) lying in the normal con^ to K° at u* . Alternatively, 
one can use the “slackened Lagrangian” L'{x,y,u) = L(x,u) +u^y, to be maxi- 
mized over X X K and just apply the same reasoning. We leave it as an exercise 
to do the calculations in the inequality- constrained case of Remarlf[^ □ 

It is important to realize that we have here a constructive process to make 
primal optimal solutions: 

Fact 16 Let a dual algorithm produce u* solving the dual problem. Suppose that 

- the filling property of Dehnition M A hold.s at u* , 

- appropriate convexity properties hold for X, c and L{-,u*) so that the above 
reasoning is valid. 

Under the mere condition that this algorithm is able to declare optimality of 
u* , it provides through (1231) a point x* which solves the primal problem 0 . □ 

Let us emphasize this fact: unless the dual algorithm minimizing 0 is a naive 
heuristic, without even a proof of convergence, the information necessary to com- 
pute X* - i.e. the xUs and aUs in (1231 ) ~ is certainly available somewhere in this 
algorithm, namely in its stopping criterion: to detect (approximate) optimality 
of u*, the algorithm must know (approximate) maximizers Xk of L(-,u*) and 
convex multipliers ak such that gu* := — ctkc{xk) is (approximately) zero. 
More will be seen on this question in Section lST^ 

® This is the set of g such that (v — u*) ^ 0 for all v G K°; or equivalently, the set 
of g G K such that g^u = 0; see |25l Ex. 111.5.2.6(a)] for example. 
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Duality Gaps. We have already encountered the duality gap: this is the differ- 
ence of the optimal values in o and in CD. It is always a nonnegative number 
because of weak duality. In view of Proposition[14]and the reasoning following it 
(and remembering Fact|^, we guess that the dual problem amounts to solving a 
certain convex relaxation of CD- The duality gap would then be the correspond- 
ing increase of the optimal value, and this is the object of the present section; 
see also |T0] §16], |14I42| . 

The most general and best known result is as follows. Introduce the pertur- 
bation function associated with ([TJ , which is the optimal value as a function of 
the righthand side of the constraints: 

K™ 9 g >->■ v{g) := Taax{f{x) : x G X, c(x) = g} . (24) 

Lagrangian relaxation amounts to replacing v by its concave upper semicontinu- 
ous hull: this is the smallest function v** which is concave, upper semicontinuous 
(u.s.c.) and larger than v: v**{g) ^ v{g) for all g G K"*. Alternatively, v** is the 
function whose hypograph (everything that is under the graph) is the closed 
convex hull of the hypograph of v, as a subset of ffi.™ x K.. The role of v and v** 
comes from the following result: 

Theorem 17 The dual optimal value is the value at 0 of the concave u.s.c. hull 
of the perturbation function: min0 = u**(0). □ 

Similarly to Proposition[TT] this result is completely general, as it assumes 
no structure whatsoever on X (while FactllT)! for example, implies that the word 
“convex combination” has a meaning in X, remember note|2] d I 127I) . In a way, 
it is minimal: it only requires the image-space K.™ of the constraints to be a 
Euclidean spacee and neglects completely the. On the other hand, it is rather 
abstract: visualizing the behaviour of the perturbation function is not easy, in 
terms of the data A, /, c. This is why some more specific properties are worth 
investigating when some structure is present in the data. 

When X C R" (so that taking convex combinations in X makes sense) it 
becomes possible to analyze the duality gap in a more primal language. 

Assume first that the Lagrangian is affine in x. Then its maximum values are 
not changed if X is replaced by its closed convex hul0. In view of weak duality, 
this implies the following: 

Fact 18 For an instance of o with linear data: 

max6^a:, xGXcW^, Ax = a , (25) 

denote by X the closed convex hull of X. The dual minimal value min0 is not 
smaller than the maximal value in ()25ll with X replaced by X. □ 

The dual function is then essentially the support function of X, already seen in 
Remarkl^ more precisely, in the notation of II25D . d(u) = a^u -I- ax(b — A^u). 
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Beware that “not smaller than” does not mean “equal to”: this result does 
not mean that the duality gap amounts to replacing X \sy X . In terms of Theo- 
rem llTI convexifying X amounts to taking the concave hull of the perturbation 
function v; but the resulting function need not be upper semicontinuous at 0. 
Here lies a technical difficulty, which will be illustrated by the counter-example 
1^ below. Nevertheless, cases where this difficulty does occur can be considered 
as pathological. The duality gap exactly makes this convexification in most cases, 
for example those described in the following result: 

Theorem 19 Equality holds in TaciQ3 in any of the following cases: 

(i) X is a bounded set in K"; 

(ii) for any g € K.™ close enough to 0, there is x € X such that Ax = a + g; 

(Hi) there exists u* minimizing the dual function, and the filling properti YT^ 
holds at u* . □ 

- Property (i) above is of course that of “ordinary” Lagrangian relaxation in 
combinatorial optimization; the corresponding result goes back to ca, redis- 
covered on various occasions, see for example H91491 . At this point, it is worth 
mentioning that the closed convex hull of a bounded set is bounded (1251 
Thm. IV. 1.4.3] for example): when A is a finite set, X is just the convex hull 
of X . 

~ Property (ii) is universally invoked, under the name of Slater condition. Less 
known is the fact that it is necessary and sufficient for the dual function to 
have a nonempty compact set of minimum points. 

- As for Property (iii) , the result is easy to establish from Proposition[2] and 
the discussion following it. 

Remark 20 Remember column generation of ^2.6[ it also replaces a set X by its 
closed convex hull. When oc < -l-oo, Theoren il9\ applies: both column generation 
and Lagrangian relaxation find the common optimal value of 6D or dH). 

When oc= -|-oo - or more generally if X in (US) is an arbitrary (unbounded) 
closed convex set - there may be a pathological duality gap. In that case, La- 
grangian relaxation does not solve CHD, but rather its dual ©• So does column 
generation as well, this will result from ^4-^ below. □ 

The previous development applies when the Lagrangian L(-, u) is affine. Such 
is no longer the case when has non-affine data / and/or c; the situation is 
then substantially more complicated, we outline the kind of results that can be 
obtained. Similarly to N2.4I and Remark]^ use a slack-technique and introduce 
the set 



G := {(cc, So, s) : X G X , So ^ f{x), s = c(a:)} C M" x R x R™ , (26) 

which combines the graph of c and the epigraph of /. Put the so“£>xis as vertical. 
Solving |[TD amounts to intersecting G with the subspace s = 0, and then finding 
the highest point in the resulting set. Now write the dual function as 

9{u) = max sq ~ 

(x,so,s)eC? 
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to see that it is not changed if Q is replaced by its closed convex huli. This shows 
how FactITFIand Theorem|l9] can be generalized to nonlinear data: Lagrangian 
relaxation amounts to close-convexifying the above Q. However, the closed convex 
hull of Q - and its intersection with s = 0 - is a rather complicated object; so 
we stop our development at this point and refer the interested reader to [42] . 

Exact Relaxation. Finally, a relevant question is when the duality gap is 0. 
According to Theoren jl7l this occurs exactly when the perturbation function 
V coincides with its concave u.s.c. hull at 0. Such is the case when v itself is 
concave and upper semicontinuous at 0. 

- Concavity holds whenever “everything is convex” in JT]) (cf. the discussion 
following TheoremQZ): A is a convex set (say in R"), / is a concave function 
on X, and c is affine: c{x) = Ax — a. As already observed in Remark(2Dl there 
is little hope to have a 0 duality gap if X, or more generally Q of , is not 
convex. 

- Upper semicontinuity is a “normal” property, in the sense that convex problems 
with a duality gap can be considered as pathological. Two sorts of situations 
are known, where such a pathology cannot occur: 

(i) One is similar to the situation where the filling property [T^ holds, namely 
when the primal problem enjoys some compactness - see Theorem |13l 

(ii) The other is based on the following important result: a convex function is 
continuous at a point whenever it is defined (< +oo) in a neighborhood 
of this point. Particularizing this result to our context, assume that the 
perturbation function v of (1241) is concave. It is continuous (hence u.s.c.) 
at 0 

- if it is finite (> — oo) in a neighborhood of 0, or equivalently, 

- if the feasible domain remains feasible despite small perturbations of the 
righthand sides. 

This explains the Slater condition popping up in TheoremjTn] Interestingly 
enough, this situation is somehow dual to (i), in that it corresponds to 
compactness in the dual space. 

We collect these situations in one exactness result: 

Theorem 21 Let dU be a eonvex optimization problem: X is a elosed eonvex 
set in M", / : A — >■ R is a eoneave funetion, the eonstraint- functions cj are 
affine. Assume that the dual function © is not identically +oo. Then there is 
no duality gap - (E) and dl have the same optimal value - if one at least of the 
following properties holds: 

(i) X is a bounded set in R", / [resp. each inequality constraint] is upper [resp. 
lower] semicontinuous on X; 

(ii) for any g € R™ close enough to 0, there is x € X such that c(x) = g; 

(Hi) there exists u* minimizing the dual function, and the filling vroverti lJ^ 
holds at u* ; 



® Comparing with noteE p. 11291 we have 9{u) — <yg{Q, 1, —u). 
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(iv) X is a polyhedral set, f and all eonstraints are ajfine funetions (linear 
programming). □ 

Adapting this result to (non-affine) inequality contraints (Remark|3j and 
conic programming ( il2.4l is an interesting exercise. We conclude this section with 
a counter-example, essentially due to R.T. Rockafellar, to show that pathological 
duality gaps do exist. 

Example 22 Consider the following instanee of set in terms of {x, y, z) € 

R3; 

{ maxx, X := {{x,y,z) : ^/x^~^^ ^ z} , 
ci(x,y,z) := a: - 1 < 0 , 

C 2 {x,y,z) := z-y < 0, 

C3{x,y,z) := 1 - z < 0 . 

It is easy to see that the eonstraints imply z = y and a; = 0. The optimal value 
is 0, attained on all of the feasible set, namely { 0 }x ] — oo, 1 ] x [ 1 , -|-oo[ . 

Now keep X as the universe, introduce three (nonnegative) dual variables 
u,v,w and dualize Ci, c^, C 3 , forming the Lagrangian 

L = X — u{x — 1) — v{z — y) — w(l — z) 

= (1 — u)x + vy + {w — v)z + u — w . 

Maximizing it with respect to z implies w — v ^ 0 - otherwise we get -l-oo in 
which case the optimal z is y/x'^ + y'^. The dual function is therefore obtained 
by maximizing 

L' = {1 — u)x + vy — {v — w)\/x'^ + y'^ 

with respect to x,y. A key is to realize that the maximum of L' is either 0 or 
-koo, depending whether -\/(l — it)^ + ^ v — w; in fact the maximal value of 

L' and its argmax are given as follows: 



y/{l — uy + v‘^ 


uiaxx^y L' 


maximal (x,y) 


< V — w 


0 


(0,0) 


= V — w 


0 


M+(l — u, v) 


> V — w 


-kOO 


- 



To see fix first the norm of {x,y) G the maximum is attained for {x,y) 
collinear to (1 — u,v). Then adjust suitably the norm of {x,y). 

Because w ^ 0, the condition -\/(l — uY + ^ v — w implies u — 1 and 

w = 0; then this condition is satisfied for arbitrary v ^ 0. The corresponding 
value of L (the dual function) is u — w = 1. In summary, the dual optimal value 
is 1 (> Q), attained on all of the dual feasible set, namely { 1 } x K x { 0 }. □ 

® An expert in convex analysis will recognize in maxL' the indicator function of the 
ball of radius {v — w) at the point (1 — u,v) £ (up to the multiplicative term 
V — w). 
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Among other things, this example illustrates the distinction between FactHSl 
and Theorem llQI the objective is linear, the constraints are affine, both primal 
and dual problems have optimal solutions. Nevertheless there is a duality gap; 
what is missing is the filling property [12] We mention that the counter-example 
relies on a crucial property: its primal and dual optimal sets are unbounded, as 
well as the argmax of the Lagrangian at a dual optimal solution. 

4 Dual Algorithms I: Subgradient, Cutting Plane, accpm 

We now come back to the dual space and study numerical methods solving the 
dual problem. Mainly for simplicity, we assume that there are no dual constraints, 
a solution exists in m for all u G M"*. Even though it is not essential for dual 
algorithms, this assumption eases substantially the exposition. 

Thus we are in the following situation: 

Assumption 23 We are given a convex function 9, to be minimized. 

- It is defined on the whole o/M™. 

- The only information available on 9 is an oracle, which returns 9(u) and one 
subgradient g(u), for any given u G R™. 

- No control is possible on g{u) when d9{u) is not the singleton V9(u). □ 

The third assumption above just means that it is impossible to select a 
particular maximizer of the Lagrangian, in case there are several. As far as dual 
algorithms are concerned, the fact that g(u) has the form — c(x„) is irrelevant: Xu 
is an unknown object to the dual algorithm, which deals only with vectors in R"*. 
The present situation is rather classical in the world of nonlinear progamming, 
where one has to minimize some objective function, given function- and gradient- 
values only, say 9(u) and g(u) respectively. The difference here is that g(u) does 
not vary continuously with u: we are dealing with nonsmooth optimization. 

Remark 24 In some situations, a much richer oracle is available; most notably 
in the relaxation of quadratically constrained problems of ^2.3[ The corresponding 
dual problems are traditionally solved via interior-point methods, coming from 
SDP optimization (\ 1 \ 73 ^ ). However, problems such as (|8j, or more generally 
(Eoj, can be fruitfully considered as instances of nonsmooth convex problems, 
for which a rather complete subdifjerential analysis can be performed. For such 
problems, algorithms of the type below can be used, as well as more specialized 
ones, taking advantage of the richer information available. This gives birth to 
useful alternatives to interior-point methods, particularly well-suited for large 
SDP problems; see 

There are essentially two t^es of nonsmooth optimization methods: subgra- 
dient (E3J and cutting-plan43 (' ^4.21) . somehow lying at the opposite sides of a 
continuum of methods; in between, are found analytic center and bundle 

^ This terminology is unfortunate in a combinatorial community: here the planes in 
question cut a portion of the space where epi0 is lying. 
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In what follows, we will call the g{uk) computed by the oracle of Assump- 
tion|23] when called at some iterate uu- 

4.1 Subgradients and Ellipsoid Methods 

Let UK be the current iterate. Consider the subgradient relation (1201 for u strictly 
better than the current iterate uk- we have 

9{uk) > 0{u) ^ 9{uk) + gxiu - Uk) , 

which shows that gj^{u — uk) > 0 for any such u, in particular for u = u*, a 
point minimizing 0. Then it is easy to see that, for t > 0 small enough, the point 
u(t) := Uk — tgK is strictly closer than uk to u, hence to any optimal u*. 

This is the starting idea for the subgradient method, going back to jZ2], for- 
malized in n 1I61| , and often called “Lagrangian relaxation” in the combinatorial 
communitjo It works as follows: 

At iteration K, select Ik > 0 and set uk+i = uk — tK9K ■ 

It is hard to imagine anything simpler. In fact, the only responsibility of the 
method is to choose the stepsize Ik ~ since no control on gK can be exerted from 
the dual space. Besides, the choice of Ik is also the simplest thing in the world, 
at least in theory: 

Theorem 25 Let the subgradient method he applied with the stepsizes given as 
follows: 

A 

= 71 17 j with Xk i 0 and } Xk = +oo . 

\\9k\\ 

Then the sequence of best function-values 0’^ := min{0(ufe) : fc = 1,... ,K} 
tends to the minimum value of 0 over K"* . □ 

However, this extreme simplicity has its price in terms of speed of conver- 
gence: the method can only give a coarse approximation of the optimal value. 
Anyway observe that the above result is of little practical value: who can wait 
long enough to check that Xk satisfies the required properties? This is why a 
big issue when implementing this method is a correct adjustment of Ik] a hard 
job because the real trouble lies in gK, a bad directioi70. 

Considering that better convergence implies better directions, an idea is to 
adapt the metric. Make a change of variable in R™, say u = Bv {B being an 
invertible matrix). Then minimizing 0'{v) := 0{Bv) is obviously equivalent to 

® This terminology is unfortunate in any community: it mixes the general methodology 
of Lagrangian relaxation and the particular technique of subgradient optimization: 
the latter method is by no means the only possibility to implement the mechanism 
m, o - or more generally, to minimize a convex function. 

® In nonlinear programming, —gx corresponds to the gradient method, or steepest 
descent, only used by amateurs. 
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minimizing 9. However, suppose that 9 is differentiable; then the gradient of 9' is 
V9'{v) = B^V9{Bv). It turns out that the same formula holds for subgradients. 
As a result, the subgradient iteration to minimize 9' is vk+i = vk — tKB^ 9 k- 
Multiplying by B to return in the tt-space, we obtain uk+i = uk — txBB^ qk] 
the direction has changed, as requested. 

N.Z.Shor thus devised a class of variable metric subgradient methods, with 
a metric B = Bk varying at each iteration. To construct Bk, he used the 
elementary operator dilating the space along a given vector ^ = ^k, suitably 
chosen. He gave two choices for 

The Ellipsoid Algorithm. His first idea was motivated by the following ob- 
servation 1 |B5I §3.1]): assume for simplicity that 9 has a unique minimum point 
u*; the (sub)gradient direction is so bad that it may be considered as orthogonal 
to the direction pointing to u* . It is therefore advisable to favour the subspace 
orthogonal to Bk9k- This led him to take = Bk9k] the resulting algorithm 
was given in m- Later on, A.S. Nemirovski defined specific values for the co- 
efficient of dilatation and for the stepsize tx [M]) and this was the celebrated 
ellipsoid algorithm. 



The r Algorithm. Shor was never too convinced by the behaviour of the 
above algorithm, and he defined another choice for Consider a point where 
g{-) is discontinuous: for two values of u rather close together, say uk and uk-i, 
the oracle returns two very different subgradient- values gx and gx-i- Then 
there are good reasons to think that the desired direction (pointing toward u*) 
is orthogonal to gx — 9k-i (draw a picture in two dimensions, with gx and 
gx-i quasi-opposite). It is therefore advisable to dilate the space along the bad 
direction ^x ■= 9k — 9k-i- This gave birth to the r-algorithm m- 

Remark 26 Another derivation of the subgradient method is worth mentioning. 
Take a nonsingular m x m matrix M and form the quadratic function q{h) := 
9(ux) + g"^ h+}^hJ Mh. Straightforward calculations show that it has a minimum 
point at h = —M~^g. Thus, a subgradient algorithm can be derived as follows: 
consider q{h) with M — B~^ B~^ as a “model” approximating 9{ux + h); then 
minimize q; then move along the direction thus obtained with a stepsize tx- This 
idea of an approximating model such as q is important in optimization; it will 
be most instrumental in the methods to come. □ 



We conclude this section with a comment about primal recovery. Indeed the 
subgradient algorithm does provide the primal point x* of FactlTHI This little- 
known property belonged to folklore ( |65[ p. 116], [^), until it was formally given 
in [^. In fact, suppose each primal point xu, computed by the oracle at Uk, are 
stored along the iterations. Then compute 



XX ■ = 



J2k=l 



Under reasonable assumptions, this averaged point converges to x* of 1231) . 
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4.2 The Cutting-Plane Method, or of Kelley, Cheney- Goldstein 

The previous methods were simplest, here is now a fairly sophisticated one, 
going back to [ZEB]. Every call to the oracle, input with Uk say, defines an affine 
function approximating 9 from below: 9{u) ^ 9{uk) + gj{u — Uk) for all u S R™ 
(with equality for u = Uk)- After K such calls, we have on hand a piecewise 
affine function 9 which underestimates 9: for all u G M"*, 

9{u) ^ 9{u) := max {9{uk) + gj {u — Uk) : k = 1, . . . ,K}, (27) 

which we will call the cutting-plane model of 9. It is entirely characterized by 
the K-uple of elements (9{uk),gk) G this is what we will call the bundle 

of information. 

Admitting that the model approximates correctly the true 9, minimizing 
it makes sense; and this is a linear programming problem. Thus, the method 
commonly called “cutting planes” in nonlinear programming, or also Kelley- 
Cheney-Goldstein, consists in computing an optimal solution (uK+i,'r'K-i-i) of 

Tminr, (u, r) G 

\r^ 9{uk) -G gJ{u-Uk) for fc = 1, . . . , AT . 

The next iterate is uk+i, with its model-value 9(ux+i) = r^+i- A call to the 
oracle input with ux-i-i gives a new piece (9{uK-i-i), gK+i) enriching the bundle 
(raising the cutting-plane model 9) and the process is repeated. 

In view of ( 1271 . we have by construction uka-i = Huka-i) ^ 9(u) < 9(u) for 
all u G K”". Then the number 



S := 9{uk) — tk+i = 9{uk) — 9{uk+i) ^ 0 (29) 

is of particular interest. 

- It represents a nominal decrease, which would be dropped from the true 9 if 
the model were accurate enouglo- 

- It serves to bracket the dual optimal value: because tk+i = Q{uk) — d ^ 9{u) 
for all u, we have 



9{uk) ^ min6< ^ 9{uk) - S = rx+i ■ 

- Hence it can serve as stopping test: the algorithm can be stopped as soon as 
S is smalll3, and this will eventually occur if (and only if) 0 is a cluster point 
of the sequence of (5- values. 

- In view of the bracket, this last event exactly means 

liminf0(uif) = limsupr^+i = min0. 

It is even a maximal decrease, and even an upper bound on the total possible decrease 
9{uk) — rninO. 

A tradition is rather to stop when 9{uk+i) ~ cx+i is small. We introduce 5 because 
it will be a central object in bundle methods, m below. Anyway the most reasonable 
test is based on the best function- value 9'^ of Theorem l25l and should be preferred. 
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- The above sequence {9{uk)) can be - and usually is - cahotic. By contrast, 
the sequence (tk) is obviously nondecreasing (since the function 9 increases 
at each iteration) and bounded from above (by any 9(u)); it therefore has a 
limit. 

Thus the key issue for convergence is whether tk t min 6*. Such is the case 
indeed, but with a technical restriction which turns out to be important: a 
mechanism is needed to guarantee boundedness of the sequence (uk)- 



The Case of Column Generation. To close this section, we proceed to show 
that the present cutting-plane algorithm is nothing other than column genera- 
tion of 1)2. til In fact form the dual of dZSJ: with (bi)dual variables ak ^ 0, the 
(bi)Lagrangian is 



K K K 

L'{u,r,a) = r - r'^ttk + '^ ak (9{uk) ~ gjuk) +u^'^ akgk ■ 

fc=l k=l k=l 

Minimizing L'(u, ■, a) forces ak = 1: the a^’s form a set of convex multipliers. 
Minimizing L'(-,r,a) forces J2k ^kgk = 0. The (bi)dual problem is therefore 

K K K 

max ^ ak (9{uk) - gjuk) , Ofe ^ 0 , ^ = 1 , ^ akgk = 0 . 

k—1 k—1 k—1 

Now remember that we are actually solving a dual problem: returning to the 
primal space of |I1), we can write 9{uk) = L{xk,Uk) and gk = —c{xk), so thaif^ 
9{uk) — ujgk = f{xk)- Thus, the above (bi)dual is 

K K 

max akf{xk) , afec(a;fc) = 0 , (30) 

where Ak denotes the unit-simplex in R^. 

This problem is definitely posed in the primal space: it involves (convex 
combinations of) objective- and constraint- values. When the objective function 
f{x) = X is linear and the constraints c{x) = Ax — a are affine, we recognize 
exactly the master problem {HD Xk being the convex hull of {xi, . . . , Xk}- In 
summary: 

Fact 27 Knowing that (II6D is just an instance o/([T} with polyhedral/linear data 
X,f,c, soluing it by the column- generation master (ED amounts to solving the 
dual {H by the cutting-plane algorithm (l28t . □ 

This supplements Remark|H] (in particular, it makes no difference to have 
X or its convex hull as universe, remember Fact fTSl) . As a consequence, we see 
that 

Remember that gk G d6{uk). An expert in convex analysis will detect the conjugate 
function: 6*{gk) = 9{uk) — gJ.Uk, which “explains” Everett’s Theoremllll 
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- the present cutting-plane algorithm, 

- the subgradient algorithm of >14. II 

- the variants of the latter (ellipsoid, r-algorithm), 

- ACCPM, which will be seen below in >j4.31 

- the bundle algorithms of ^ 

all of these are various alternatives to solve the same problem: © or m, or at 
least their convex relaxation. 

Reverting the above formulation, we find generalized linear programming: 

Fact 28 The cutting-plane algorithm (128 ll . as well as all of the above-mentioned 
alternatives, generalize the column- generation mechanism of ^2.6\ to nonlinear/- 
nonpolyhedral data in (HU - knowing that (1281) is just the so-called generalized 
linear programming algorithm. □ 

A final word, with relation to Fact lltil going along the end of >14.11 Again 
suppose the primal points Xk, computed by the oracle at Uk, are stored along 
with the dual elements 9(uk) and gk- Then an optimal solution a of the (bi)dual 
(OCT allows the computation of the primal point x := OikXk- This point, which 
actually solves (I19II . approximates a primal optimal solution as mentioned in >13.21 
see (OCT again. 



4.3 Stability Problems: ACCPM 

From a practical point of view, observe that (OCT is obviously feasible (take r 
large enough) but has no reason to be bounded from below: 6 has no reason to 
have a finite minimum. Think of AT = 1: we get V2 = —oo and U2 at infinity, and 
even no matter how close ui is to a minimum of 9. The cutting-plane algorithm 
is inherently unstable. The mechanism to bound uk, required by theory, has 
also a practical role: to stabilize the algorithm. 

Although apparently naive, the above argument is a serious one and serves 
as a basis for an impressive counter-example found in [54j (and reproduced in 
12a § XV. 1.1]). A function is defined, not particularly pathological - essentially 
9{u) = max {0, —1 -I- ||u||} -, and for which obtaining the approximate optimality 
condition i5 ^ e^, requires K{s) ~ (1/e)™ cutting-plane iterations. 

By comparison, the ellipsoid algorithm performs as follows: to obtain the 
same accuracy in the worst case (no matter how nasty the oracle 9 can be), 
an order of log (1/e) — mlog Al(e) iterations are needed at moslP^. 

This bad behaviour of the cutting-plane algorithm is due to the fact that the 
model 9 is much too optimistic; something else is needed. 

Remark 29 In nonlinear programming, it is commonly admitted that an opti- 
mization algorithm can hardly be good if it is not based on a model approximating 

For bundle methods to come, the worst-case number of iterations is of the order 
1/e® (see |74I33| 1. This is definitely worse than ellipsoid, but does not depend on the 
dimension m of the space. 
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the objective function to second order. Here 6 can be considered as a “global” ap- 
proximation of 0 (in the sense that it provides information all over the space 
'MJ”'); but it is a good local approximation nowhere. The best we can hope is to 
have the Ist-order approximation property 9{uk h) = 9{uk h) o(||/i||) near 
each reference point uu. 

Unfortunately, if it is conceivable to construct convenient first-order approx- 
imations of convex functions, second-order is at present fictional (do not forget 
that the 9 we are minimizing does not even have continuous first derivatives); 
for efforts along these lines see for example and the references therein. 

This is true only for oracle- functions, though; when more information is known 
from the objective functions, more is possible; see m Chap. 14], the more recent 
and Remar Mi^^ □ 

A stabilizing idea in the spirit of interior-point methods works as follows. Let 

be the best function- value of Theorem!^ and form the set 

Pk := : 9{u) ^ r ^ 6»^} 

= {(w,r) : 9{uk) + gj {u - Uk) ^ r ^ 9*^^ for fc = 1, . . . , AT} . 

This is a polyhedron in the graph-space which we assume bounded (pos- 

sibly via the mechanism used to bound the cutting-plane iterates). 

The cutting-plane algorithm consists in taking for (uK-i-i, a point that 

is lowest in Pr-; according to Nemirovski’s counter-example, this is a bad point. 
A better idea is to take a point that is central in Pr-. A good concept for this, 
due to G. Sonnevend 1691 , is the analytic center of Pk' the point maximizing the 
product of slacks (or sum of their logs) of the constraints defining Pk. Based on 
this idea, the method ACCPM (analytic center cutting-plane method) has been 
defined in m-: see [ 22 ] for a recent overview. 

To conclude this section, let us mention that the same stability problem ex- 
ists for subgradient methods of 94.lt gk jumps along the iterations, reflecting 
the nondifferentiabilitjlil of 0. It can even be said that instability is an inher- 
ent difficulty with Lagrangian relaxation, and more generally with nonsmooth 
optimization. Indeed remember Pronosition ll 41 let only to prove optimality of a 
given u* , the dual algorithm has to get from the oracle a number (possibly as 
large as m-l- 1) of distinguished primal points Xk, in order to make up the point 
X* of FactfTT)! 

5 Dual Algorithms II: Bundle Methods 

Just as ACCPM, bundle methods start from cutting planes and aim at overcoming 
the instability problem mentioned in 94.31 However, while ACCPM preserves the 
global character of the cutting-plane model 9 (in the sense that Pk does not 
favour any of the successive interates Ui,. . . ,uk), they borrow from nonlinear 
programming the concept of local approximation - see Remarkl29l 

This phenomenon is best explained on the column generation model of 92.61 when 
the c = Cu of Assumption|3(iii) varies, the >c returned by the oracle can only jump 
from one k to another. 
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5.1 The Algorithmic Scheme 

The rationale of bundle methods is as follows: 

- A stability center is chosen; denote it by u: it is a point one would like uk+i 
to be close to, its choice will be made precise below. 

- As said already, the cutting-plane model 6 is much too optimistic: say 0 <C 0 
in a neighborhood of u. 

-Asa result, minimizing 9 gives a uk+i much too far from ft (a disaster revealed 
by Nemirovski’s counter-example). 

- It is therefore advisable to add a positive term to 0, so as to improve its local 
approximation character; 

- we choose a quadratic stabilizing term, of the type ||u — u||^, 

- whose effect is to pull uk+i toward ft, thereby stabilizing the algorithm. 

Using a spring strength t > 0 (which may depend on the iteration number) 
aimed at tuning the stabilization effect, the method therefore approximates the 
true function 9{u) by the “stabilized” cutting-plane model 

e{u):=e{u) + ^^\\u-u\\\ ( 31 ) 

which is minimized at each iteration. Instead of (EHl), a linearly constrained 
quadratic problem is solved: 

|minr-k , (u, r) G 

i r ^ 9{uk) + gJ{u-Uk) for A: = 1, . . . , A . 

This problem has always a unique optimal solution (u^+i, r^+i). Besides, 
the two positive numbers 



S ■= 9{u) - 9{uk+i) , 

S := 9{u) - 9 {uk+i) = S- ^\\uk+i - w|P 

still represent a prediction of the expected decrease 9{u) — 9 {uk+i), probably 
more reasonable than the <5 of (l29l) produced by (l28l) (remember note|Tn] p. 11361) . 

To complete the iteration, we must take care of the stability center. For this 
we test the actual decrease, as compared to S; say 



9{uk+i) < 9{uk) - k6, (34) 

K being a fixed tolerance {S could be preferred to S but this is a detail: one shows 
that 6 G [5,2(5]). If (l34t holds, we set u = uk+i, in the bundle jargon, this is 
called a descent-step, as opposed to a null-step: if (1341) does not hold, u is left 
as it is (but 9 is enriched). For convergence, k must be positive. Besides, one 
easily sees that the descent property (1341 equivalently means that the model is 
accurate enough: 

9{uk+i) < 9{uk+i) + (1 - k)5 . 

Thus, K must also be smaller than 1, otherwise u will never be updated. 
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Let us now consider the stopping test. Because uk+i minimizes 6 instead of 
9, and because 9 need not be smaller than the actual objective function, neither 
5 nor 5 of (l29l l provide a safe bracket of the optimal value min 9. Unless one really 
trusts the approximation 9 of m, stopping simply when 5 or <5 is small may be 
unduly optimistic. Something safer is as follows: uk+i minimizes the model 
which is a convex function, hence 0 G d9(uK+i) = d9{uK+i) + \{uk+i — u). 
This can be written 

uk +1 — u = —tg for some g G d9{uK+i) , (35) 

which reveals an important object: g = {u — UK+i)lti called the regularized 
gradient in the bundle jargon. As shown in Theorem |32| below. it can be computed 
from the dual of (E21. 

Thus we can write for all u G R™: 

9{u) ^ 9{u) ^ 9{uk+i) + (T{u - UK+i) 

= 9{u) - 5 + g^ {u - UK+i ) , 

so that we can stop when both S and g are smal0. We summarize various con- 
vergence properties of bundle methods, collected from |74I2D|8| . ESI Chaps. IX, 
XIII, XV]; see also pTH] . 

Theorem 30 We assume for simplicity that the stepsize t remains fixed along 
the iterations; say t = l. 

(i) There holds at each iteration 

9{u) ^ 9{u) — S + g^{u — ii) 

^ 9{u) — S — ||g|| ||m — {t|| for all u G R™ . 

(ii) Any of the following equivalent events 

g = 0 or ^ = 0 or 5 = 0 or uk+i = u 

guarantees that u minimizes 9 over R"*. 

(Hi) The sequences {g), (5), (5) and {u — uk+i) tend to 9. 

(iv) The sequence of stability centers is minimizing: 9{u) j,min0. 

(v) If 9 has a nonempty set of minimizing points, then the entire sequence of 
stability centers (u) converges to such a minimizing point. 

(vi) The events mentioned in (ii) do occur at some finite iteration K when the 
set of possible answers from the oracle is finite. □ 

Case (vi) above concerns a finite universe X in m and means that 9 is 
polyhedral; this is for example the case in column generation. 

Even though this result considers a fixed stepsize, it should be noted that 
numerical efficiency depends crucially on a varying t, suitably updated at each 
iteration. This is for sure a weak point of the approach, although some update 
strategies do exist: [321431 . 

In anticipation of iI5. 21 below, note that the cutting-plane algorithm of 44. 21 produces 
Il36> with g = 0. 
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Remark 31 Having thus derived bundle methods from the cutting-plane method 
o/ ^23 ■*^6 can also compare (1351) with the subgradient iteration of §//.![ • 

- Here the next iterate is moved from the stability center u, instead of the current 
iterate uk- 

- Alternatively, a subgradient method can be seen as a sort of bundle method 
whithout any check for descent such as dSl.' one systematically sets it = uk- 

- The move is made in a direction resembling a subgradient. Actually, it can be 
shown that g G dO{u) if t is small enough. 

- We also mention that the g of lf35l) is vaguely related to the direction used in 

the r-algorithm (not necessarilty with t small). Specifying this link more clearly 
needs investigation, though. □ 

Thus, a subgradient method can in a way be seen as a particular bundle 
method, obtained for small t. Conversely, it is rather clear in (l3^ that a large t 
results in a weak stabilization: we come close to the pure cutting-plane method. 
Between these two extremes, bundle methods form a whole continuum parame- 
terized by t. 

5.2 Primal-Dual Relations: Augmented Lagrangian 

Forming the dual of ([32l) is a good exercise illustrating (j2.2L and is quite parallel 
to the derivation of With nonnegative (bi)dual variables a G K.^, the 

(bi)Lagrangian is 



Minimizing L'(u, •, a) forces ak = 1: the a^’s form a set of convex multipliers. 
Minimizing L'(-, r, a) produces the unique u = Ua = u—tg{a), where we have set 
g{a) := ^kgk € K™. Plugging this value into L' gives the (bi)dual function 



Now remember that we are actually solving a dual problem: returning to the 
primal space of o, we can write 9{uk) = L{xk,Uk) and gk = —c(xk), so that 
9(uk) — 9k = f{xk). Recalling the notation C K.^ for the unit-simplex, 
the (bi)dual problem can be written 



max Vafc/(a:fe) -- 115(a) f -k , V afec(a;/c) = - 5 (a) . (37) 

ol^Ak Z 





K 



K 



Duality between the pair of linear-quadratic problems (I32L (1371) gives (invoke 
for example Theorem fTTl since (1371) has a compact feasible set): 
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Theorem 32 Let a) solve (l37ll . The optimal solution {uk+i,tk+i = 0{uk+i)) 
of (I 32 II is then obtained from (l35ll with g := g{a). 



Besides, we have 



K 



/ ;= ^ akf{xk) = 0{u) -6 + tWgW^ - ii^ g . 



□ 



The interesting point is to compare HSU with dSO): while the cutting-plane 
method imposes the constraint '^k^kc{xk) = 0 at each iteration, a bundle 
method relaxes this constraint via two terms in the objective function: 

- one is a penalization with the penalty coefficient < > 0; 

- the other is a dualization with the value u for the dual variable; incidentally, 
the term ctkf{xk) + g{oi) in (l37l) can also be written akL{xk, u). 

This is the so-called augmented Lagrangian technique, well-known in nonlinear 
programming, and much studied in for example. It combines Lagrangian 
relaxation with an alternative way of eliminating constraints: penalization. 

Remark 33 As already said on several oeeasions, a bundle algorithm ean be 
used for eolumn generation. After eonvergence of the algorithm, the primal point 
X := J2k ^kXk of (137) ean still be eonstrueted. Ln eontrast with the end of 
however, this point is not feasible during the course of the algorithm. Theorerr Ad()\ 
guarantees via dSSD that this point becomes feasible asymptotically.' g (standing 
for g{x) or Ax — a) tends to 0. Besides, f (standing for f{x) or b^ x) tends to 
TomO if g^ u tends to 0 - for example if ii is bounded; in this case, x becomes 
optimal for the relaxed primal problem. □ 

Augmented Lagrangian is an important theory, let us consider it from a 
higher point of view. To eliminate constraints in a general optimization problem 
such as (P, an alternative to dualizing them is to penalize them: one maximizes 
without constraints f{x) — |||c(a;)||^ over the whole of X; quite a simple idea. It 
is not difficult to establish the asymptotic equivalence or this penalized problem 
with when t — >■ -|-oo. Augmented Lagrangian does something more subtle: 
it applies Lagrangian relaxation to 



This seems of no avail: using simultaneously relaxation and penalization seems 
simply redundant. However, the situation is substantially different in the dual. 
In fact duality applied to (l38t gives the augmented Lagrangian 




(38) 



Lt{x,u) := f{x) - ^||c(x)f - u^c{x) 



and the corresponding augmented dual function 

9t{u) := m.axLt{x,u) . 
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It can be shown that 

(i) if u = u* minimizes 9 of and if there is no duality gap, maximizing 

can only produce primal points which are feasible, hence optimal: 
any “parasitic” infeasible primal point is eliminated; 

(ii) if t — >■ + 00 , maximizing Lt(-,u) produces again primal optimal solutions, 
even for bad u. 

Concerning (ii), the influence of t can be drastic: compare the next result with 
TheoremfTTl 



Fact 34 Under mild and natural assumptions (always satisfied, for example, if 
X is a finite set), augmented Lagrangian cancels out the duality gap: for t large 
enough, min^t = u(0) of □ 

The price to pay is that Assumptions|2] and |7Kiii) are usually no longer valid 
when L is replaced by Lp, as a result, augmented Lagrangian can hardly be used 
in practice. Nevertheless, (i) suggests the ability of augmented Lagrangian to 
stabilize primal infeasibilities, so that the idea can be translated to the numerical 
held - and this is what bundle methods do. 

For an illustration, consider m- Using Assumption[7Kii), formulate its con- 
vex relaxation in terms of convex multipliers a: 

oc oc 

max akb^ Kk , akAnu = a € K™ . (39) 



Standard column generation defines approximations x, with oc replaced by a 
smaller number AT; a very natural idea, just requiring an appropriate mechanism 
to generate >cx+i, namely the satellite. The bundle approach takes a multiplier 
vector u G M™, a penalty coefflcient t > 0, forms the augmented Lagrangian 



Lt{a,u) 



OC 



L akf{xk) 



t 

2 



oc 



I] o^kc{xk) 



2 oc 

- X] CtkC{Xk) , 
k=l 



(40) 



and makes the following reasoning: 

- Standard duality/penalty theory says that minimizing Lt{-,u) is equivalent to 
dMJ in two cases: 

- if u is a dual solution (and t > 0 is arbitrary); this results from Theoremf2TlfiL 
for example (the unit-simplex is compact); 

- if t — >■ -l-cx) (and u is arbitrary in R"*); this natural result can be proved 
rather easily. 

- Unfortunately, both problems have comparable complexity, so nothing is really 
gained so far. 

- A column-generation-like technique is therefore applied to the minimiza- 

tion of Lt{-,u) (even though it is unconstrained - but one can always set 
g := —J2k^kc{xk), considered as a constraint): oc is replaced by K to ob- 
tain estimates x = again a very natural idea if the augmented 

Lagrangian idea is accepted as natural. 
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- However, the optimal u is not known; hence, simultaneously to the process 
generating x, one generates dual estimates u. In addition to the satellite, a 
mechanism is required to generate uk+i ~ this is (|32|) . 

Inequality Constraints. As seen in Remark[^ primal inequality constraints 
entail nonnegative dual variables: (15211 is changed to 

{ minr + ^||u — u|p , (u, r) G , 

r ^ 6{uk) + gj{u- Uk) for fc = 1, . . . , AT , 
u ^ 0 

(knowing that (l28ll is recovered if t — >■ +oo). As for the primal space, the reader 
may check that dualizing the above quadratic problem - with the additional 
term in the (bi) Lagrangian - results in transforming (I57H to maximizing 
over a G Ak and s ^ 0 

^ f 

^ akf(xk) - -\\g{a) - sf + u^{g{a) - s ) , 

fc=i 

where g{a) := —J^k^i'^kcixk), as before: this formulation reveals the term 
Sfci ctkc{xk) + s, coming from the constraint c{x) + s = 0, s ^ 0. 

This again goes with the general theory: using the slackened formulation of 
an inequality-constrained primal problem, the augmented Lagrangian is 

L't{x,s,u) = f{x) - ^\\c{x) + s|p - u^{c{x) + s ) , 

to be maximized with respect to x € X and s ^ 0 G R.™. Incidentally, maximiza- 
tion with respect to s can be carried out explicitly; for given x, we obtain the 
m-vector s = s(x) whose jth component is max{0, — Cj(a:) —Uj/t}. With this 
notation, the augmented dual function is then 

0't(u) = maxL(x, u) — u^s(x) — -||c(a;) -I- s(x)|p . 

2 

5.3 Quadratic Solvers and Poorman Bundle Methods 

The main argument against bundle methods is that they replace an “easy” 
linear program (l28l) or (1^ by a “difficult” quadratic program (l32l) or (1371) . 
This explains various proposals, in which the stabilizer is polyhedral instead of 
quadratic; the model 0 of m is replaced by, say 

0{u) := 0{u) + ^\\u - m||oo ■ 

A similar stabilization was proposed in the seminal boxstep method of |5()J : the 
idea was to minimize the model 

0{u) if ||m - u||oo < P , 

- 1-00 otherwise , 



0{u) := 
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i.e. to minimize 9 over the trust-region ||w — tt||oo ^ P- The purpose of this section 
is to study the issue. 

Why a Quadratic Master? First there are good reasons to use a quadratic 
term in the model. 

(i) One is by analogy with smooth optimization. As mentioned in Remar k l29l 
quadratic terms in the model, incorporating second-order information on 
the objective function, are crucial for efficiency. 

Indeed it is shown in [23 § 11.2.2(b)] that even a very naive quadratic 
term such as in (i.e. an extremely coarse second-order approximation), 
may improve drastically the performances: the number of iterations may 
be divided by a factor of 10^. 

(ii) Similarly, for the so-called composite optimization problem (which basically 
means to minimize the maximum of finitely many smooth functions), a good 
model is the sum of a polyhedral and of a quadratic function, even quite 
naive as above; see [13 Chap. 14], but also [S2l Chap. 3]. 

LJ 

By contrast, a polyhedral stabilization - such as in 0 of boxstep - is 
strongly biased: it favors the extreme points of the corresponding polyhedral 
unit-ball, irrespective of the original problem (think of the basis vectors, 
which are the extreme points of the i\ unit-ball; their cosine with the “ideal” 
direction - pointing to an optimal solution - is most probably of the order 
1/n). 

(iii) In our present context, motivated in M.3I a quadratic term has several 
advantages. 

- It guarantees compactness: the model 0 of (13111 has always a minimum 
point, for any t > 0; such is not the case with the above 9, for example. 

~ It preserves the possible first-order approximation property of the cutting- 
plane model: 9{u h) = 9{u h) odJ/i]]); here again, 9 does not. 

- It consistently stabilizes uk+i, uo matter how t > 0 is chosen, and no 
matter how u is close to a minimum point of 9. By contrast, 9 may be 

l_l 

minimal at u, an unfortunate event; as for 0, its asymptotic efficiency 
necessitates to have p — > 0 at a suitable speed (if 0 reaches its minimum 
in the interior of the box of radius p centered at ft, the boxstep precaution 
could just be forgotten). 

(iv) The case of ACCPM is interesting. First, it does not cure by itself the com- 
pactness problem: a center is not defined for an unbounded set Pk in M.3I 
Besides, a naive quadratic term (with u fixed to 0) is again beneficial, at 
least theoretically: see [55] - 

A prospective argument could be added to the above list: in some future, 
research in convex analysis might very well produce appropriate quadratic mod- 
els, less primitive than the mere Euclidean norm in 0. Then a quadratic mas- 
ter will become a must: the difference between pure cutting planes of ^14.21 and 
such “second-order like” methods will be as large as, say, the difference between 
Gauss-Seidel and Newton for systems of equations. 
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Efficiency of Quadratic Solvers. The belief that a quadratic master cannot 
be solved as efficiently as a linear one is indeed misleading. 

Observe first that linear and quadratic programming call essentially for the 
same techniques: 

- either pivotal (0, [m Chap. 10]), the only real difference being that the di- 
mension of the optimal basis is not known in advance; 

- or using interior points [57I7B]. 

Actual implementations solve the dual (l!17ll . which has a fairly particular 
form: a fixed simplicial constraint structure, and a Hessian which is simply the 
Gram matrix Besides, its complexity is just proportional to K, the 

number of pieces in the bundle: the dimension m of the dual space does not 
count. Observe for example that, with AT = 1, we obtain from (l35l) the explicit 
solution U 2 = u — tgi (in this case, d9 is everywhere reduced to the singleton 
{gi}; alternatively, (l37ll has the single feasible point oi = 1). 

Another important point is that the present context lends itself to restarts, 
just as in column generation. From the Kth master (IH71) to the (AT -I- l)st, the 
differences are: 

- one more variable ax+i] setting it to 0 and appending it to the old optimal 
solution a produces a feasible point in the new master dSI); 

- one more row-column in the Hessian matrix, obtained by computing the AT -|- 1 
scalar products gj gx+i^ k = 1, . . . , AT -I- 1; 

- one more linear term f{xK+i) — gK+i', 

- possibly a change in the old linear terms f{xk) — u^c{xk), in case the stability 
center u has been updated (descent step); 

- a few other minor changes in case the bundle has been cleared - see below. 
Among available software exploiting fully these features, let us cite I30I31I16I . 

Indeed, it is safe to say that the future of nonlinear optimization now requires 
a substantial development of quadratic programming software, analogous to the 
tremendous impetus of the last 50 years for linear programming. Such software, 
incidentally, is also required for “ordinary” smooth optimization problems, which 
are nowadays solved by the so-called sequential quadratie programming approach 
(SQP); see |in] §12.4], Chap. 18]. 



Clearing the Bundle: Cheap Quadratic Masters. At least in theory, the 
quadratic master can be made straigtforward, as the bundle size can be kept as 
small as 2. In fact, it is always possible to eliminate from the bundle any number 
of pieces, provided that one keeps a so-called aggregate piece, which is the affine 
function 



e^{u) := 9 {uk+i) + g^ {u - UK+i) ■ 
Using (|35j, we see that 



(41) 



9^ ^ 9 and 9^{uk+i) = 9{uk+i) ■ 
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The aggregate piece can thus be viewed as another linearization of 0, which can 
be incorporated into the bundle without harm. 

Remark 35 The above aggregation has a nice primal interpretation. First ob- 
serve that each piece in the bundle can be expressed in terms of the Lagrangian: 

0{uk) -F gj{u- Uk) = L{xk, u) . 

Next, using complementary slackness in and remembering the primal points 
Xk, we see that 9 {uk+i) = i’k+i is equal to 

K K 

'^ak{9{uk) - gkUk -F g^ uk+i) = '^akL{xk,UK+i) + 9k'^K+i , 

k^l k^l 

SO that the aggregate piece 

K K K 

6’^{u) = '^ctkL{xk,u) = '^akf(xk) - u'^o.kc{xk) 
k=l k=l k=l 

is a convex combination of Lagrangians. Besides, this writing connotes the primal 
candidate x = d>kXk, already encountered on several occasions. □ 

The aggregate piece summarizes the information contained in the bundle in 
such an efficient way that it suffices by itself to guarantee convergence of the 
dual process: 

Theorem 36 The convergence properties (i)-(v) of Theorem \dfA rem.a.in valid if, 
prior to any iteration K -\- 1, the following operations are performed: 

- any existing piece {9{uk),gk) (possibly all) is cancelled from the bundle; 

- the aggregate piece {9(uK-ki), g) is inserted; 

- the new piece {9{uK-ki), gK-ki) is appended. □ 

As a result, the complexity of the master can be controlled quasi at will: any 
number of pieces can be eliminated, at any timeF^. Naturally, if any eliminated 
piece k has ak = 0, there is no need to insert the aggregate piece: it is implicitly 
present. On the other hand, the simplest possible master at step AT + 1 is 

minr + ^||u - , r 9^{u) , r ^ 9 {uk+i) + gk+i{u - uk+i) ■ (42) 

Theorem l36l is explained as follows. The only existing scheme to prove con- 
vergence of a bundle method is to argue that the optimal value in (|32t , with K 
replaced by Al-l-1, is certainly larger than that of (|42l) , which has less constraints. 

Incidentally, this is by no means shared by the cutting-plane methods of 94.21 al- 
though some works exist to eliminate inactive pieces ([ZO]): a bundle of dimension 
at least m -I- 1 must of course be maintained (otherwise 6 cannot be bounded from 
below) . 
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Then it is rather easy to analyze the •poorman bundle method, in which the bun- 
dle is fully cleared at each iteration and the master (l42ll is solved. Convergence 
results for this method are a fortiori true for any “richer” strategy. 

In other words: no theoretical result establishes a difference between maximal 
and minimal bundles. This is rather frustrating, as a rich bundle method can 
reasonably be expected to converge faster than a poor one. This belief can be 
assessed only by computational experience, though, which is the subject of the 
next section. 

5.4 Numerical Illustrations 

For illustration, we applied bundle method to the Held-Karp dual of a traveling 
salesman problem [2^, with datasets from tsplib; all the graphs were complete, 
with a dimension m of the dual space equal to the number of nodes. The dual 
algorithm was that of |4^, with the quadratic solver of j^. The oracle to com- 
pute the 1-tree for given u was very elementary, inspecting systematically the 
0{m?) possibilities, and storing no distance matrix: the m? (Euclidean) distances 
were recomputed everytime they were needed. Table[I]gives the results obtained 
when the algorithm was pushed to optimalitjf^. The runs were performed with 
a maximum bundle size of 1000. 



Table 1. Bundle method pushed to full optimality 



Problem 


m 


—9{uk) 


K 


A'd 


i 


CPU 


% master 


% oracle 


grl20 


120 


1606.3125 


132 


40 


42 


Is 


35 


44 


pcb442 


442 


50499.499 


556 


109 


116 


151s 


64 


28 


pcbll73 


1173 


56350.993 


502 


65 


81 


438s 


31 


64 


pcb3038 


3038 


136587.50 


4212 


324 


367 


lOh 


30 


56 


fnl4461 


4461 


181569.21 


6965 


470 


361 


30h? 


8? 


87? 



In this table, K is the number of iterations needed (each iteration is one 
resolution of the master (^71) -|- one maximization of the Lagrangian) ; is the 
number of descent steps, updating the stability center u; ^ is the number of active 
constraints in the final master (E21. The last three columns concern computing 
times. Some observations are worth mentioning. 

- We also tested fll577 but failed: after some 80 000 iterations (about a week of 
computer time), we still had 9{uk) — —21 833, a value not close to min0. 

- The last three columns were obtained by the unix command gprof and are 
indicative only. In particular, gprof gave for fnl4461 an implausible computing 
time of 4200s; hence our question marks in the corresponding entries of Tabled] 

The stopping test is explained in [41J . Given a threshold e set by the user, two 
corresponding tolerances are computed for S of (I29|l and ||g|| of (I35II . implying that 9 
is minimized within relative accuracy e. All the runs reported in Table[T]had e = 10~® 
and terminated with S ^ 10“^ and ||g|| ^ 10“®. 



150 



C. Lemarechal 



“ The CPU time spent in Lagrangian maximizations (last column) becomes 
heavier when m is larger; by comparison, the time spent in the master problem 
(E3 is light during the first iterations, when A: <C 1 000. 

- The values of I are interesting. With a pure cutting-plane method, t could only 
equal m -I- 1. Now I has several interpretations: 

- It is the dimensionality of d0{u)] if 9 were differentiable, we would have 1=1, 
the oracle answering some gk = 0. 

- From a primal point of view, £ is the number of Xk’s making up the optimal 
solution X of the relaxed primal problem. 

- Because £ <C m, we have to conclude that the set of dual optimal solutions 
has fairly large dimensionalitjlil. 

- Alternatively, the optimal set of the primal relaxed problem has fairly small 
dimensionality. 

Note, however, that all this is highly informal, and subject to some approx- 
imation anyway; Table|4] shows that it could be true only in a perfect world 
with everything computed exactly - including the solution of (1321 . 

Using the same test-problems, we report the same information with the poor- 
man bundle method of if5.3l More precisely, the method is still that of | 13 ] , with 
a bundle systematically reduced to three elements: the aggregate and the new 
ones, as required by Theorem l3(iL and the g returned by the oracle at the current 
u (this element is useful for the t-update of [ 13 ] )• TableO reads as Tabled] know- 
ing that the algorithm was interrupted “manually” when 4 exact digits were 
obtained in the objective function. 



Table 2. Poorman bundle method 



Problem 


—9{uk) 


K 


Ad 


CPU 


% master 


% oracle 


grl20 


1606.1746 


173 


26 


Is 


1 


89 


pcb442 


50495.121 


233 


29 


19s 


2 


94 


pcbll73 


56345.576 


236 


26 


155s 


1 


94 


pcb3038 


136573.85 


6920 


482 


5h45 


0 


94 


fnl4461 


181551.12 


411 


41 


45min 


0 


95 



For a more complete comparison, TableEl reports the number of iterations 
respectively needed by the two versions (poor and rich), to obtain 2, 3, 4 exact 
digits (i.e. relative accuracies from 10“^ to 10“^; mi = 0 is already 10“^-optimal 
in these problems). The last double column summarizes the computing time 
to reach 10“^ relative accuracy. In each double column, the cheap version is 
reported first. 

Our final experiments illustrate how the algorithm proves optimality of a 
given u. Still the same test-problems are used, and the bundle method is run 

We have dim d0{u) = £ — 1. Because 6 is piecewise affine, 6{u) = 9{u) + 0'{u, u — u) 
for u — u small enough; and 6\u,u — u) = Q li u — u 1. d9(u). 
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Table 3. Comparison of poor and rich versions 



Problem 


10"^ 


10“® 


10“'‘ 


CPU (10-®) 


grl20 


21 24 


93 72 


173 102 


Os Os 


pcb442 


24 23 


131 79 


233 261 


10s 6s 


pcbll73 


27 22 


104 93 


236 187 


61s 63s 


pcb3038 


26 28 


128 146 


6920 782 


540s 630s 


fnl4461 


23 21 


126 120 


411 811 


760s 800s 



starting from an optimal U\ (obtained from the runs in Table[lJ; t > 0 is kept 
small. Then only null-steps are performed and every sampling point Uk is close 
to u = ui, so that gk S 96* 



Table 4. Proving optimality 



Problem 


—6{uk) 


K 


R'd 


t 


final IIpII 


grl20 


1606.3125 


47 


18 


42 


2 X 10"®® 


pcb442 


50500. 


105 


24 


105 


3 X 10"^° 


pcbll73 


56361. 


93 


27 


91 


5 X 10"“ 


pcb3038 


136587.5 


700 


39 


570 


2 X 10“® 


fnl4461 


181569.21 


357 


35 


357 


1 X 10"^° 



The results are given in Tabled Because we are dealing with approximations 
only, the tests serve as purification: the starting ui is not exactly optimal, a few 
descent-steps may occur, the uk’s may not all be in an appropriate face of epi0, 
the final £-value may not be the same as in Table[ll This is why we report again 
the final 9k, the number of descent-steps, and i. The latter is denoted by 
£* , to be distinguished from the i of Table[ll Of course, we must have K ^ i*; 
actually, K — i* represents the number of “errors” made by the algorithm in 
identifying the necessary optimal faces; said otherwise, the algorithm computes 
K subgradients at u (or close to), £* of which are finally used to make up 
the 0 vector. Table[4| gives also the norm of this (approximate) 0 vector, whose 
accuracy is assessed by the fact that each gj. from the oracle is an integer vector, 
and therefore has norm larger than 1. 



Concluding Remarks. Inspection of the above numerical results suggests a 
number of comments. 

(i) Bundle methods appear as reasonably efficient alternatives to traditional 
algorithms for Lagrangian relaxation, hence for column generation. Note 
in particular that all our reported experiments used a fairly simple oracle. 



19 



i.e. (uk,9{uk)) lies in a face of epi9 containing the optimal (u,min0). 
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delivering one column at a time. Notwithstanding, serious comparisons are 
still to be made between bundle, subgradient, cutting planes and ACCPM. 

(ii) A dual problem of large dimension m is likely to be difficult. However, £*, 
which resembles the dimensionality of an optimal 80, seems a lot more im- 
portant: a problem with small i* - or with a solution set of large dimension 

- seems relatively “easy” . 

(iii) Just as finiteness, piecewise affinity is a concept that should be taken with 
care: there is no clearcut between a piecewise affine 0 (finitely many possible 
answers from the oracle) and a truly nonlinear one (a continuum of possible 
answers). This is illustrated by some observations, in our present context: 

- Any run of grl20, with various values given to various parameters (start- 
ing point, stopping tolerance, bundle size, etc.), invariably produced 
£* =42. Besides, ||g|| decreased regularly along the iterations down to 
about 10“®, and then fell abruptly to 10“^®, clearly revealing finite con- 
vergence. 

- For all other test-problems, variations in the parameters produced fluc- 

tuations of about 10% in the final £* . This occurred even with pcbll73 
(seemingly easy), and also when starting from an optimal u\. Besides, H^H 
decreased regularly to its minimal value around Such a behaviour 

definitely connotes a truly nonlinear problem. 

(iv) The poorman variant is surprisingly efficient. With Bema,rk|31 1 in mind, it 
assesses older works such as m- Good behaviour of cheap bundle meth- 
ods was already observed in m and related works. Such variants are of 
course very attractive, as they use little computing time, and are relatively 
easy to encode (although a good knowhow is strongly advised: nonsmooth 
optimization is by no means an easy exercise). 

The above observation can be turned the other way: “richman” bundle 
methods are disappointing (observe in particular the disaster for fnl4461 
in Table[21 in which the richer version is the slower). The information con- 
tained in a bundle of size 1000 has to be worth something: if this infor- 
mation does not help to improve efficiency, for sure it is misused; and of 
course the culprit is the Euclidean norm (a “poorman” stabilizer). This is 
why research has been and still is conducted, to define richer stabilizers: 
see Remarkl^again. Some works already exist proposing stabilizers of the 
form |(u — Mk{u — u): see I51I17I53I . and also |48], where convexity of 
0 is not even used. On this “quasi-second-order” subject, the final word is 
far from being said. 

(v) By contrast, Table|4] illustrates quite an interesting behaviour: since £* is a 
lower bound for K. the algorithm could hardly do better in the situation 
of the experiments^ See also the results in [23 §IX.3.3], which display a 
similar behaviour. The role of the stabilizer is here blatantly demonstrated. 



However pcb3038 shows that fast convergence is never guaranteed. Our trials to 
obtain||g|| around 10“^° remained fruitless: the quadratic solver ran into difficulties, 
K was much larger than £* , the algorithm had to be restarted several times, etc. 
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Indeed the conditions are here most favourable for 0 to look like a piecewise 
affine function: many answers from the oracle are eliminated from the pic- 
ture. Yet the far-from-piecewise-afhne model 9 of m allows an accurate 
description of 9. 

We believe that here lies the crux of nonsmooth optimization. To mini- 
mize a nonsmooth function such as d, two processes are involved: 

- To diminish 9 down to its minimal value; in Lagrangian relaxation, this 
means to generate best possible upper bounds for the primal problem. 

- To prove optimality of u, i.e. to genarate an x which solves the relaxed 

primal problem. In cutting-plane methods of HA.2\ each x is feasible, what 
is needed is to increase the corresponding value to 9{u). In 

a bundle method, the problem is mainly to decrase ||g|| down to 0. 

Because of our two observations (iv) and (v), the two processes seem fairly 
unrelated and call for fairly different tools. The difficulty is to find an appro- 
priate model (such as 9 or 9) in which they groove together harmoniously. 
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Abstract. Branch-and-cut (-and-price) algorithms belong to the most 
successful techniques for solving mixed integer linear programs and com- 
binatorial optimization problems to optimality (or, at least, with certi- 
fied quality). In this unit, we concentrate on sequential branch-and-cut 
for hard combinatorial optimization problems, while branch-and-cut for 
general mixed integer linear programming is treated in [ — > Martin] and 
parallel branch-and-cut is treated in [ — > Ladanyi/Ralphs/Trotter]. Af- 
ter telling our most recent story of a successful application of branch- 
and-cut in Section [T] we give in Section [2] a brief review of the history, 
including the contributions of pioneers with an emphasis on the compu- 
tational aspects of their work. In Section |3l the components of a generic 
branch-and-cut algorithm are described and illustrated on the traveling 
salesman problem. In Section |4l we first elaborate a bit on the impor- 
tant separation problem where we use the traveling salesman problem 
and the maximum cut problem as examples, then we show how branch- 
and-cut can be applied to problems with a very large number of variables 
(branch-and-cut-and-price). Section O is devoted to the design and ap- 
plications of the ABACUS software framework for the implementation 
of branch-and-cut algorithms. Finally, in Section [6] we make a few re- 
marks on the solution of the exercise consisting of the design of a simple 
TSP-solver in ABACUS. 



1 Our Most Recent Story 

Branch-and-cut has become a widely used method for the solution of hard in- 
teger or mixed integer problems. We refer to a recent survey of Gaprara and 
Fischetti m for a view on its wide range of applications. In this chapter, the 
emphasis will be mostly on combinatorial optimization problems. Before going 
into a brief historical tour through the main algorithmic achievements that lead 
to or are connected with branch-and-cut, hoping to get the reader’s interest be- 
fore entering into more technical topics, we want to report on our most recent 
experience m with this method. The little story that follows is not only an 
example where branch-and-cut was quite useful, but shows also how combinato- 
rial optimization can sometimes provide an excellent modeling tool to solve real 
world problems. 



M. Jiinger and D. Naddef (Eds.): Computat. Comb. Optimization, LNCS 2241, pp. 157-|22^ 2001. 
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During the seminar “Constraint Programming and Integer Programming” in 
SchloB Dagstuhl, Germany, 17-21 January 2000, the participants tried to identify 
problems for which a fruitful interaction/competition of constraint programming 
techniques and integer/combinatorial optimization techniques appears challeng- 
ing and is likely to enhance the interaction of both communities. One problem 
area the participants agreed upon consists of various feasibility/optimization 
problems occurring in scheduling sports tournaments. The break minimization 
problem for sports leagues was addressed by both communities during the work- 
shop, in particular by Jean Charles Regin of the constraint programming com- 
munity and by Michael Trick of the integer programming/combinatorial opti- 
mization community. 



1.1 Break Minimization 

We deal with the situation where in a sports league consisting of an even number 
n of teams each team plays each other team once in n — 1 consecutive weeks. 
Each game is played in one of the two opponents’ home towns and each feasible 
schedule has the following properties: 

FI For each team, the teams played in weeks 1, . . . , n — 1 are a permutation of 
all other teams. 

F2 If in week w team i plays team j “at home” ( “-I-” ) then team j plays team 
i in week w in i’s town, i.e., “away” (“—”). 

Fig.d] shows two possible schedules for a league of eight teams. The rows 
show the game plan for each team, column 1 displays a team, column w G 
{2, . . . , n} shows the opponent in week re— 1, “J-”, and respectively, indicate 
if the game is at home or away. In sports scheduling it is considered undesirable 
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(a) (b) 

Fig. 1. Two feasible schedules for eight teams 



if any team plays two consecutive games either both at home or both away. 
Such a situation is called a break. We are given a feasible tournament schedule 
without home-away assignment and our task is to find a feasible home-away 
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assignment that minimizes the number of breaks. The schedule of Fig. [Ha) 
imposes 8 breaks whereas the schedule in Fig. [TJb) imposes only 6 breaks. In 
both cases, the number of breaks is minimum. Schreuder |66| has shown that, 
for an even number n of teams, it is always possible to construct a schedule 
that allows a home-away assignment with the minimum number of n — 2 breaks 
and he has given an efficient algorithm to compute such a schedule (along with 
the home-away assignment). However, sport tournament schedules are subject 
to a number of requirements, among them restrictions such as “geographically 
close teams should not play at home during the same week” . Some authors (like 
Schreuder m) propose to start with an optimum schedule with n — 2 breaks 
and incorporate the additional requirements at the cost of more breaks, others 
(like Regin [62|fi,Sj and Trick [SZj ) propose to consider a schedule without home- 
away assignment that obeys the various (often not formally describable) side 
conditions and compute a home-away assignment as to minimize the number of 
breaks. It is the latter attitude we take here. 

Regin formulated a constraint programming model with 0-1-variables and was 
able to solve instances up to size 20. Trick introduced an integer programming 
formulation and was able to solve instances up to size 22. 

The complexity status of the break minimization problem has not yet been 
determined (to the best of our knowledge), we believe it is NP-hard. 



1.2 From Minimizing Breaks to Maximizing Cuts 

Given a feasible tournament schedule without home-away assignment 



1 : til ti2 ■ ■ 


■ tl.n-1 


2 : t21 t22 ■ ■ 


■ t2,n-l 
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in which tij S {1,2,... , n}\|z} is the opponent of team i in week j, we construct 
an undirected graph G = {V, E) with a node v = (i,j) & V for i G (1, 2, . . . , n} 
and j G (1,2, .. . ,n — 1}. There is an edge in E between nodes {i,j) and {k,l) 
in V if and only if i = k and l<j = ^— l<n— 1. I.e., G = (V, E) is as follows: 




A cut in G = (V,E), i.e., an edge set G of the form G = 5{W), W C V, 
where S{W) = {e G E \ v G W, w G V \ W}, partitions V into V = F+ U V~ 
{V'^C\V~ = 0), where and V~ are called the different shores of the cut. Any 
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home-away assignment corresponds to a cut in G such that, by [F2], (i, j) S F+ 
if and only if G V~ and \E\ — \C\ is the number of breaks. This condition 

is modeled by stipulating that (i, j) and {Uj,j) belong to different shores of the 
cut. We could model this by assigning a capacity of 1 to all edges of G introduced 
so far and adding edges with a value of M > n(n — 2) + 1 between all 
such pairs of nodes: Fig.|^shows the graph G resulting from this transformation 
for the instance given in Fig. [^a)- 




thin edges have capacity 1, fat edges have capacity M 



Fig. 2. Result of the big-M transformation 



The choice of M would trivially guarantee a correct model. Namely, if the 
maximum capacity cut with these edge weights has value Z\, then assigning “+” 
to tij with {i,j) G V~^ and ” to tij with (i,j) G V~ gives a schedule with the 

minimum number of Z\ — M breaks. But, due to a result of Barahona and 

Mahjoub [T0| we can do much better. If we switch the signs of the capacities 
of all edges in the star of any node v in G, the maximum capacity cut induces 
a maximum capacity cut with the original capacities in which v changes shores 
and all other nodes stay at their shore. 

Therefore, for each edge with capacity M we switch the capacity of the star 
of one of its end-nodes. The resulting edge with capacity —M will be in no 
maximum cut, so we can contract the edge. We obtain a maximum cut instance 
with nodes and n{n — 2) edges with capacities either 1 or —1. The result 

for our example is shown in Fig. in which we have (arbitrarily) chosen to 
switch the cut of the lower indexed vertex each time. 

The same graph is displayed in Fig. [3| along with a cut of maximum capacity 
22 that is indicated by white and grey nodes, respectively. 
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thin edges have capacity 1, fat edges have capacity —1 



Fig. 3. Result of the transformation 




capacity 1, in the cut 
capacity -1, not in the cut 
capacity -1, in the cut 



Fig. 4. A cut of maximum capacity 22 
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By backward transformation, this cut corresponds to the home-away assign- 
ment displayed in Fig. [TJa) with 8 breaks, which proves our claim that 8 breaks 
is minimum for this instance. 

1.3 Computational Results 

For the computation of maximum capacity cuts we have used an implementation 
of the branch-and-cut algorithm described in [Sj that was re-implemented by 
Martin Diehl using the ABACUS software |31], version 2.3 with Cplex 6.5 as 
an LP-Solver. The same implementation has been used successfully in, e.g., [25] 
126] for computing ground states of Ising spin glasses. 

The instances were created by computing optimum ((n — 2)-breaks schedules) 
by Schreuder’s procedure and permuting the columns randomly. For each site, 
we created five random schedules and applied the branch-and-cut algorithm. 
The results displayed in Table 1 were obtained on a 296 MHz Sun Ultra SPARC 
machine. 



Table 1. Computational Results 
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We also considered one real world instance, namely the current Bundesliga 
(first national German soccer league) instance. There are 18 teams and we found 
that the schedule is already optimum at 16 breaks! 

We do not have access to the instances used in the computational studies 
by Regin and Trick, however, at the time of writing this, it seems that Trick’s 
approach is slightly ahead in solving 22 teams instances in about 1 hour on a 
266 MHz machine. Apparently, our approach is able to handle larger instances 
easily. 



2 A Bit of History 

Mathematical Programming, originating as a scientific discipline in the late for- 
ties, is concerned with the search for optimal decisions under restrictions. The 
most prominent mathematical models include linear programming, (mixed) in- 
teger linear programming and combinatorial optimization with linear objective 
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function. A surprisingly large variety of problems in economics, mathematics, 
computer science, operations research and the natural sciences can be captured 
by these models and effectively solved by appropriate software. Years before com- 
puter science emerged as a scientific discipline, the protagonists of mathematical 
programming, who were mainly applied mathematicians or mathematically ori- 
ented economists, aimed at the development of algorithms that could solve the 
problem instances, in the beginning primarily economic and military planning 
applications, by hand calculations and, when electronic computers became avail- 
able in the early fifties, via computer software. 



Linear Programming 

The father of linear programming is George B. Dantzig, who proposed the model 

maximize c^x 

subject to Ax < b (1) 

cc > 0 

where c € IR",A € S IR"*, and invented the celebrated simplex algo- 

rithm for its solution. In the first sentence of his textbook “Linear Programming 
and Extensions” he states: “The final test of a theory is its capacity to 
solve the problems which originated it” and before he proceeds to the acknowl- 
edgments, he writes: “The viewpoint of this work is constructive. It reflects the 
beginning of a theory sufficiently powerful to cope with some of the challeng- 
ing decision problems upon which it was founded.” Not only the model and 
the simplex algorithm have pertained to this day, but also the philosophy he 
summarizes in these two sentences remained a leitmotif for mathematical pro- 
grammers until today, as Michel Balinski puts it in [5]: “First the real problems, 
then the theory ... ”, and we can safely add: “ . . . and then implementing 
and testing it on the real problems.” The simplex algorithm was implemented 
in the US National Bureau of Standards on the Standards Eastern Automatic 
Computer (SEAC) as early as 1952, and computational testing and comparison 
with competing methods was performed in a way that meets today’s standards 
for such experiments. Slightly later, around 1954, William Orchard-Hays of the 
RAND Corporation designed the first commercial implementations of the (re- 
vised) simplex algorithm for the IBM CPC, IBM 701 and IBM 704 machines. 
Also direct commercial applications in the oil industry were started as early as 
1952 by A. Charnes, W.W. Cooper and B. Mellon |13| . 



Mixed Integer Programming 

It became soon clear, though, that in real problems integrality of some of the 
decision variables is required. If an optimal plan of activities requires using 1.3 
airplanes, this makes no sense. Also, very often yes/no decisions are desired 
that can be modeled using binary variables. So the (mixed) integer (binary) 
linear programming model is the linear programming model in which (some) 
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variables are required to take integral (binary) values. In 1958, Ralph Gomory 
[13 2 j presented an elegant algorithmic solution to the integer linear programming 
problem that was later refined to the mixed integer case. Interestingly, he not 
only invented the “cutting plane” method and proved its finiteness and correct- 
ness, but he also implemented it in the same year on an E 101 and an IBM 704 
computer in the then brand-new FORTRAN language, in order to see how it 
performs. Unfortunately, the early experiments where not very satisfactory in 
terms of practical efficiency, and Gomory’s cutting plane method, even though 
appreciated for its theoretical beauty, was not recommended for practical com- 
putations until only a few years ago Balas, Geria, Gornuejols, and Natraj [Ij gave 
computational evidence that it should be reconsidered. Instead, the branch-and- 
bound method proposed by A.H. Land and A.G. Doig in 1960 m became the 
method of choice in all commercial computer codes that were provided by many 
computer companies starting with linear programming in the sixties and mixed 
integer programming in the seventies. 



Combinatorial Optimization 

In combinatorial optimization with a linear objective function, the task consists 
of selecting from a family of subsets iF C 2-® of a finite ground set E one F G E 
that maximizes (minimizes) a linear objective function for coefficients 

Ce G M.,e G E. Mathematically, this is trivial, because an optimizing set F can be 
found by finite enumeration, however such a strategy is clearly unsatisfiable for 
practical computation. Some of the finest algorithms in computer science and 
mathematical programming that have been formulated for problems like the 
minimum spanning tree problem, the shortest path problem, or the matching 
problem by Kruskal |1B], Dijkstra [27j and Edmonds [2H], respectively, fit into 
this framework, as well as many others in combinatorics and (hyper-) graph 
theory. 

In our examples, the finite ground set E consist of the edges of a graph G and 
the feasible solutions are the spanning trees, the paths connecting two specified 
nodes and the matchings in G, respectively. 



Branch-and-Cut 

Unlike these examples, most examples with practical applications have turned 
out to be NP-hard. Nevertheless, an obvious connection to binary integer pro- 
gramming leads to an algorithmic methodology that has been found out to be 
able to solve real world instances to optimality anyway. Any F Q E is repre- 
sented by its characteristic vector y® G {0, 1}® in which Xe = 1 if and only 
if e G F. Passing from the feasible subsets to their characteristic vectors, it is 
usually not hard to define the problem as a binary linear programming problem, 
whereas it is usually a long way, theoretically and in terms of implementation 
effort, to make the well developed linear programming techniques exploitable in 
effective computation. 
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We demonstrate this on a prominent example, the traveling salesman prob- 
lem (TSP), in which the task is to find a Hamiltonian cycle (“tour”) with mini- 
mum total edge weight in a complete undirected graph. Incidentally, this is the 
problem for which now commonly used optimization techniques were first out- 
lined by G.B. Dantzig, D.R. Fulkerson and S.M. Johnson in 1954 [21]. If Xij is 
a variable corresponding to the edge connecting nodes i and j, and is the 
weight (length) of this edge, then 



minimize 
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( 2 ) 



is an integer programming formulation in which the equations make sure that 
in the solution every node has degree two and the inequalities, called subtour 
elimination constraints, guarantee that we do not obtain a collection of subtours. 
The variables with value 1 in an optimum solution of this integer program are in 
one-to-one correspondence with the edges of an optimum tour. Discarding the 
integrality conditions, we obtain a linear programming relaxation (subtour relax- 
ation) that can be used in a branch-and-bound environment: A simple scheme 
would solve the relaxation at every subproblem in which some variables have 
been set to 0 or 1. If the solution is not integral, two new subproblems, with 
one non-integral variable set to 1 in one and to 0 in the other, are created in 
a branching step. However, this linear programming relaxation contains an ex- 
ponential number of constraints (0(2")) for an instance on n nodes. This leads 
to the idea to solve the relaxation with a small subset of all constraints, check 
if the optimum solution violates any other of the constraints, and if so, append 
them to the relaxation. Thus we get a cutting plane method at each node of the 
enumeration tree that arises by branching. 

This method is called hranch-and-cut. It was first formulated for and success- 
fully applied to the linear ordering problem by Grotschel, Jiinger, and Reinelt 
in m- The first state-of-the-art branch-and-cut algorithm was published by 
Padberg and Rinaldi in m where it was used to solve large TSP instances. 
The major new features of the Padberg-Rinaldi algorithm are the branch-and- 
cut approach in conjunction with the use of column/row generation/deletion 
techniques, sophisticated separation procedures and an efficient use of the LP 
optimizer. 

There are many ways to strengthen the subtour relaxation by further classes 
of inequalities. Identifying some of them that violate a given fractional solu- 
tion is called the separation problem. This problem can be solved in polynomial 
time for the subtour elimination constraints. Of course, it would be desirable 
to compute a complete description by linear equations and inequalities of the 
convex hull of the characteristic vectors of tours, which is the TSP polytope 
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P-Psp = conv{x^ S {0, 1}^" I is the characteristic vector of a Hamiltonian 
cycle in PT„}. Here, Kn = denotes the complete undirected graph on n 

nodes. Such a description exists by classical theorems of Minkowski and Weyl. 
However, results of Karp and Papadimitriou | 45| make it unlikely that we can 
compute it except for very small n. Even when we restrict ourselves to non- 
redundant inequality systems, in which each inequality defines a facet of the 
polytope, i.e., a proper face of maximum dimension, the number of needed in- 
equalities is enormous, e.g., 42,104,442 for n = 9 and at least 51,043,900,866 for 

n = 10 uses]. 

The field of polyhedral combinatorics deals with identifying subsystems of 
such complete and non-redundant linear descriptions. In order to make a subclass 
computationally exploitable, we must devise according separation algorithms, or 
at least useful heuristics, if the problem is proven to be or appears to be hard. 

Computer implementations for branch-and-cut algorithms for the TSP have 
been devised by a few research groups, the first, in which the term “branch-and- 
cut” was used for the first time, by Padberg and Rinaldi Such programs 
solve most instances with a few hundred cities routinely and also some instances 
with a few thousand cities, see [3H] for a recent survey of the theory and prac- 
tice of this method for the TSP, m for an account how the same methodology 
can be applied to other combinatorial optimization problems, and m for a re- 
cent annotated bibliography on branch-and-cut algorithms. The first essential 
ingredient that makes such an approach work consists of a theoretical part that 
requires creative analytical investigations on identifying appropriate classes of 
inequalities (polyhedral combinatorics) and the design of separation algorithms. 
Together with the implementation of the separation algorithms this is highly 
problem specific (see Section 14. Il l . The second ingredient is integrating the sep- 
aration software into a cutting plane framework and this into an enumerative 
frame. This latter part is much less problem specific but requires a considerable 
implementation effort that can be reduced by an appropriate software system. 



Branch-and-Price 

We use the binary cutting stock, another example of a combinatorial optimization 
problem, in order to introduce a special version of “branch-and-cut”, namely 
branch- and-price. In branch-and-price the cut generation phase never takes place, 
but only columns are dynamically added to the LP relaxation at every node of the 
enumeration tree. Further details on this approach will be given in Section 14. 21 
In the binary cutting stock problem, a set of n rolls of lengths oi, 02 , . . . a„ has 
to be cut out of base rolls with length L. The problem is to determine a cutting 
strategy that minimizes the number of base rolls used. In 1961, P.C. Gilmore 
and R.E. Gomory m proposed the following model: The vector b G {0, 1}" 
represents a cutting pattern for a base roll if columns 

of the matrix B G {0,1}"^™ represent all possible cutting patterns. Then the 
problem can be modeled as the following binary linear programming problem: 
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minimize ^ Zj 
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0 < Zj < 1 


for all j £ {1, 2, . . 


■ ,rn} 




Zj e {0, 1 } for all j e { 1 , 2 , . . 


■ ,rn} 





In any feasible solution Zj = 1 if and only if the j-th cutting pattern is used. 
In contrast to our previous example in which we had a polynomial number of 
variables and an exponential number of constraints, here we have the opposite 
situation. If we solve the problem on a small subset of the columns, we have to 
make sure that the missing columns can be safely assumed to be associated with 
0 components of the optimum vector z. The simplex algorithm does not only 
give us a basic feasible solution (geometrically this corresponds to a vertex of the 
polyhedron defined by the restrictions) but also a short certificate for optimality 
consisting of a quantity yi for each row i such that X^r=i ^ijVi ^ 1 for all cutting 
patterns B.j = {bij, b 2 j, ■ ■ ■ , bnj)’^ that are present in the chosen subset. In order 
to determine if the same relation holds for all missing cutting patterns as well 
we can solve the knapsack problem 

n 

maximize ^ yibi 

" ('4') 

subject to Qibi < L ^ ' 

bi £ { 0 , 1 } 

for which effective pseudo-polynomial algorithms exist. If the maximum is at 
most I, our solution is optimum for the complete problem, otherwise the opti- 
mum pattern 5i, 62 , . . . ,bn found by the algorithm is appended as a new column 
to the formulation. This process of evaluating the missing columns is called 
“pricing” in linear programming terminology, and embedding the method into 
an enumerative frame leads, as we already said, to a branch- and-price algorithm. 
Cutting and pricing can be combined (and they are in all published state-of-the- 
art algorithms for the optimum solution of the TSP) even when the problem 
involves, like in this example, exponentially many variables. This methodology 
is called branch- and- cut- and-price and will be described in more detail in Sec- 
tion H 21 

3 Components of Branch-and-Cut 

Figure [HI shows a flowchart of a generic branch-and-bound algorithm for a mini- 
mization problem. A basic branch-and-cut algorithm is a branch-and-bound al- 
gorithm in which the bounds are solutions of LP-relaxations that are iteratively 
strengthened by problem specific cutting planes at every node of the enumer- 
ation tree. This feature incurs several technicalities that make the design and 
implementation of branch-and-cut algorithms a nontrivial task. In this section 
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Fig. 5. Flowchart of a branch-and-bound algorithm for minimization problems. 



we will present a basic branch-and-cut algorithm, address such technical details 
and give some ideas for an efficient implementation that proved to be useful in 
practice. 

A first outline of a basic branch-and-cut algorithm is given in the flowchart 
of Fig. E] in which the dashed clusters correspond to boxes in the flowchart of 
Fig. O Roughly speaking, the two leftmost of the four columns describe the 
cutting plane phases within a single subproblem, the third column shows the 
preparation and execution of a branching operation, and in the rightmost col- 
umn, the fathoming of a subproblem is performed. We give informal explanations 
of all steps of the flowchart together with some problem specific details. In the 
remainder of this section the underlying optimization problem is always assumed 
to be a minimization problem. Moreover, during the description of the basic ver- 
sion of the algorithm we assume that in all its phases the set of variables remains 
unchanged. In particular, each LP problem has a fixed number of columns, while 
the number of rows increases or decreases after the modules called SEPARATE 
and ELIMINATE have been executed (see Section 13.31) . The full version of the 
algorithm, where also columns are added to the LP, is described in Section 13.51 

Although most of the components of the presented algorithmic framework are 
problem independent in general, we choose the TSP as our main example. This 
has several reasons. First, the TSP is probably one of the most prominent combi- 
natorial optimization problems, then it is the prototype problem for which in [55] 
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Fig. 6. Flowchart of a basic branch-and-cut algorithm. 
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and |56J the superiority of branch-and-cut over the other known approaches was 
shown, and finally implementations of state of the art TSP solver like the one of 
Jiinger/Naddef/Reinelt/Thienel or Applegate/Bixby/Chvatal/Cook (see 
also [ — >■ Applegate/Bixby/Chvatal/Cook] contain all algorithmic techniques we 
want to demonstrate. Adaptions of these techniques can easily be applied to 
branch-and-cut algorithms for other combinatorial optimization problems. 

In our description, we proceed as follows. First we describe the enumerative 
part of the algorithm, i.e., we discuss in detail how branching and selection 
operations can be performed. Then we explain the work done in a subproblem of 
the enumeration. We also discuss sparse graph techniques which lead to column 
generation. 

There are two major ingredients of a branch-and-bound algorithm, the com- 
putation of global upper and local lower bounds. The lower bounds are produced 
by performing an ordinary cutting plane algorithm for each subproblem. Two 
basic techniques for the computation of upper bounds (corresponding to feasible 
solutions of the original problem) are currently being used. The first method is 
to compute a good upper bound by some heuristics before the root node of the 
complete branch-and-cut tree is processed. Later this bound can only be im- 
proved, if the solution of a linear program is the characteristic vector of a better 
feasible solution. The other method is the computation of upper bounds in the 
cutting plane phase by exploiting fractional LP-solutions. This technique may 
require more running time spent for heuristics, yet may decrease the total run- 
ning time, since the size of the enumeration tree may be smaller. For the TSP, 
we describe how the LP-solution can be utilized in oder to find good feasible 
solutions, i.e., upper bounds. 

3.1 Terminology 

Before going into details, we have to define some terminology that is used not 
only in this section but also in Section where we discuss the object oriented 
software framework ABACUS which implements the generic branch-and-cut 
algorithm of Fig. Due to a corresponding naming scheme in ABACUS every 
algorithmic component or variable of the described algorithm is easily identified 
as a module, variable or data structure in the software. 

Since in a branching step like in a branch-and-bound algorithm two (or more) 
new subproblems are generated, the set of all subproblems can be represented 
by a binary (fc-nary) tree, which we call the branch- and- cut tree. Hence we call 
a subproblem a branch-and-cut node. Fig. [ 7 | shows an example of a branch-and- 
cut tree. We distinguish between four types of branch-and-cut nodes. The node 
which is currently being processed is called the current branch-and-cut node. The 
other unfathomed leaves of the branch-and-cut tree still have to be processed and 
are called the active nodes. Finally, there are the already processed non-active 
nodes. A non-active node can either be fathomed or not fathomed. 

Each variable has one of the following attributes during the computation: 
atlowerbound, basic, atupperbound, settolowerbound, settoupperbound, 
f ixedtolowerbound, f ixedtoupperbound. When we say that a variable is fixed 
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nonactive, not fathomed 
nonactive, fathomed 



Fig. 7. A Branch-and-cut tree. 



to zero (lower bound) or one (upper bound), it means that it is at this value 
for the rest of the computation. If it is set to zero (lower bound) or one (upper 
bound), this value remains valid only for the current branch-and-cut node and 
all branch-and-cut nodes in the subtree rooted at the current one in the branch- 
and-cut tree. The conditions for fixing and setting variables will be explained 
later in Section The meanings of the other attributes are obvious: As soon as 
an LP has been solved, each variable which has not been fixed or set receives one 
of the attributes atlowerbound, basic or atupperbound by the revised simplex 
method with lower and upper bounds. 

Finally, the variable Ipval always denotes the optimum value of the last LP 
that has been solved, which is also a local lower bound 11b for the currently 
processed node, the global variable gub (global upper bound) gives the value 
of the currently best known feasible solution. The minimum lower bound of all 
active branch-and-cut nodes and the current branch-and-cut node is the global 
lower bound gib for the whole problem. The subtree rooted at the highest com- 
mon ancestor of all active and the current branch-and-cut nodes is called the 
remaining branch-and-cut tree. Therefore, we call this highest common ancestor 
also the root of the remaining branch-and-cut tree and the local lower bound 
of this node is called rootlb. The difference between gib and rootlb will be 
discussed below. 

Like in branch-and-bound terminology we call a subproblem fathomed, if the 
local lower bound Ipval of this subproblem is greater than or equal to the global 
upper bound gub, or if the subproblem becomes infeasible (e.g., if branching 
variables have been set in a way that the subproblem does not contain any 
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feasible solution), or if the subproblem is solved, i.e., the solution of the LP- 
relaxation is a feasible solution of the original problem. 

In many applications all objective function coefficients are integer. In that 
case all feasible solutions have an integer value. Therefore, all terms of the com- 
putation that express a lower bound may be rounded up, e.g., one can fathom a 
node with global upper bound gub and local lower bound 11b, if [lib] > gub. 
For some specially structured problems it might be valid to round up the local 
lower bound more than one unit, for example to the next even (or odd) integer. 
These extended roundings should always be applied since tightening the bounds 
may be essential for practical efficiency. 

The branch-and-cut algorithm consists of three different parts: The enumer- 
ative frame, the computation of lower bounds and the computation of upper 
bounds. It is easy to identify the boxes of the flowchart of Fig. with the dashed 
boxes of the flowchart of Fig. [6l 

The central part is the lower bounding part that is performed after the se- 
lection of a new current subproblem. It consists of trying to solve the current 
problem by optimizing over LP-relaxations that are tightened by adding cutting 
planes. This bounding part is left, 

— if the local lower bound is greater than or equal to the global upper bound, 

— if the LP-solution is the characteristic vector of a feasible solution, 

— if no more cutting planes can be generated, 

— if infeasibility of the current subproblem is detected, 

— if the upper bound does not decrease significantly, although cutting planes 
are added (tailing off). 

It is advantageous, although not necessary for the correctness of the algorithm, 
to reenter the bounding part if variables are fixed or set to new values by FIX 
AND SET, instead of creating new subproblems in BRANCH. 



3.2 Enumerative Frame 

The enumerative frame consists of all parts of the branch-and-cut algorithm 
except the bounding part (the leftmost dashed box of Fig. ED . 



Initialize. During the computation the algorithm stores a set of active branch- 
and-cut nodes. After input of the problem, the set of active branch-and-cut 
nodes is initialized as the empty set. To initialize the global upper bound gub, 
feasible solutions are computed by some heuristic methods. For the TSP, we can 
construct a tour with the nearest neighbor heuristic and improve it with a Lin- 
Kerninghan procedure [48] . Afterwards the root node of the complete branch- 
and-cut tree becomes the current branch-and-cut node which is now processed 
by the bounding part. 
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Fig. 8. Root change of a branch-and-cut tree. 



Bounding. The computation of the lower and upper bounds is outlined in the 
sections [3. 3l and l3^ We continue the explanation of the enumerative frame at the 
ordinary exit of the bounding part (at the end of the first column of the dashed 
bounding box). If the current branch-and-cut node cannot contain a feasible 
solution that is better than the best known one (ipval > gub), or the final 
LP-solution is the characteristic vector of a feasible solution (feasible), the 
node is fathomed. Otherwise a branching operation and the selection of another 
branch-and-cut node for further processing (third column of the flowchart) is 
prepared. 



Fix and Set. The routine FIX AND SET of Fig. [^consists of the four pro- 
cedures FIXBYREDCOST, FIXBYLOGIMP, SETBYREDCOST and SETBY- 
LOGIMP. If a branching operation is prepared, and the current branch-and-cut 
node is the root node of the branch-and-cut tree, the reduced cost of the non- 
basic variables obtained from the LP-solver can be used to fix them forever at 
their current values by the routine FIXBYREDGOST. Namely, if for an edge e 
the variable Xg is non-basic has value Xe and reduced cost is Xg, we can fix Xg 
to zero if Xg = 0 and rootlb gub and we can fix Xg to one if Xg = 1 and 

rootlb — Xg > gub. 

During the computational process, the value of gub decreases, so that at some 
later point in the computation, one of these criteria can be satisfied, even though 
it is not satisfied at the current point of the computation. Therefore, each time 
when we get a new root of the remaining branch-and-cut tree, we make a list of 
candidates for fixing of all non-basic variables along with their values (0 or 1) 
and their reduced costs and update rootlb. We get a new root of the remaining 
branch-and-cut tree, if all nodes in all subtrees except one subtree of the old 
root are fathomed (see Fig. [HI). 

Since storing these lists in every node, which might eventually become the 
root node of the remaining active nodes in the branch-and-cut tree, would use 
too much memory space, we process the complete bounding part a second time 
for the node, when it becomes the new root. If we could initialize the constraint 
system for the recomputation by those constraints that were present in the last 
LP of the first processing of this node, we would need only a single call of the sim- 
plex algorithm. However, this would require too much memory. So we initialize 
the constraint system with the constraints of the last solved LP. As some facets 



174 



M. Elf et al. 



are separated heuristically, it is not guaranteed that we can achieve the same 
local lower bound as in the previous bounding phase. Therefore we not only have 
to use the reduced costs and statuses of the variables of this recomputation, but 
also the corresponding local lower bound as rootlb in the subsequent calls of the 
routine FIXBYREDCOST. If we initialize the basis by the variables contained 
in the best known solution and call the primal simplex algorithm, we can avoid 
phase 1 of the simplex method. Of course this recomputation is not necessary 
for the root of the complete branch-and-cut tree, i.e., the first processed node. 
The list of candidates for fixing is checked by the routine FIXBYREDCOST 
whenever it has been freshly compiled or the value of the global upper bound 
gub has improved since the last call of FIXBYREDCOST. 

FIXBYREDCOST may find that a variable can be fixed to a value oppo- 
site to the one it has been set to (contradiction). This means that earlier in 
the computation, somewhere on the path of the current branch-and-cut node 
to the root of the branch-and-cut tree, we have made an unfavorable decision 
which led to this setting either directly in a branching operation or indirectly 
via SETBYREDCOST or SETBYLOGIMP (to be discussed below). 

Before starting a branching operation and if no contradiction has occurred, 
some fractional (basic) variables may have been fixed to new values (0 or 1). In 
this case we solve the new LP rather than performing the branching operation. 



Fixbylogimp. After variables have been fixed by FIXBYREDCOST, we call 
FIXBYLOGIMP. In contrast to reduced cost fixing this is not problem inde- 
pendent. For the TSP, this routine might try to fix more variables by logical 
implication as follows: If two edges incident to a node v have been fixed to I, all 
other edges incident to v can be fixed to 0. Or it might fix the edge connecting 
two given nodes to 0 if they are connected by a path of edges fixed to 1. Like 
in FIXBYREDCOST, contradictions to previous variable settings may occur. If 
variables are fixed to new values, we proceed as explained in FIXBYREDCOST. 

In principle also fixing or setting variables to zero could have logical impli- 
cations. If all incident edges of a node but two are fixed or set to zero, these 
two edges can be fixed or set to one. However, this occurs quite rarely and can 
therefore be disregarded. 



Setbyredcost. While fixings of variables are globally valid for the whole compu- 
tation, variable settings are only valid for the current branch-and-cut node and 
all branch-and-cut nodes in the subtree rooted at the current branch-and-cut 
node. SETBYREDCOST sets variables by the same criteria as FIXBYRED- 
COST, but based on the local reduced cost and the local lower bound 11b of 
the current subproblem rather than “globally valid reduced cost” and the lower 
bound of the root node rootlb. Contradictions are possible if in the meantime 
the variable has been fixed to the opposite value. In this case the current branch- 
and-cut node is fathomed. 
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Setbylogimp. This routine is called whenever SETBYREDCOST has success- 
fully set variables, as well as after a SELECT operation. It tries to set more 
variables by logical implication as follows: If two edges incident to a node v have 
been set or fixed to 1, all other edges incident to v can be set to 0 (if not fixed 
already). Like in SETBYREDCOST, all settings are associated with the current 
branch-and-cut node. If variables are set to new values, we proceed as explained 
in FIXBYREDCOST. 

After the selection of a new node in SELECT, we check if the branching 
variable of the father is set to 1 for the selected node. If this is the case, SET- 
BYLOGIMP may also set additional variables. 



Branch. Branching in general is splitting the current subproblem in two or 
more new one. There are many different strategies to achieve this. For example: 

— a fractional 0/1 variable is set to 0 and 1 

— upper and lower bounds for integer variables are changed 

— dividing the polytope by hyperplanes 

— other problem specific strategies 

If we choose the first method some variable is chosen as the branching vari- 
able and two new branch-and-cut nodes, which are the two sons of the current 
branch-and-cut node, are created and added to the set of active branch-and-cut 
nodes. In the first son the branching variable is set to 1, in the second one to 0. If 
no constraints of the integer programming formulation are violated, then there 
is at least one fractional variable that is a reasonable candidate for a branch- 
ing variable. However if a constraint of the integer programming formulation is 
violated, it is possible that all variables have an integral LP- value, yet the LP- 
solution is not a feasible solution of the original problem. In this case a variable 
with an integral LP-value has to be chosen as branching variable. 

There is a variety of different strategies for the selection of the branching 
variable, so that we can present here only some of them. Let x be the solution 
of the last solved LP. 

1. Select a variable with value close to 0.5 that has a big objective function 
coefficient in the following sense. Find L and H with L — maxjaie | Xe Y 
0.5, e G E} and H = minjxe I ^ 0.5, e G E}. Let C = {e & E \ 0.75L < 
Xe < H -1-0.25(1 — H)} be the set of variables with value “close” to 0.5. From 
the set C the variable with maximum cost is selected, i.e., with maximum 
objective function coefficient. 

2. Select the variable that has an LP-value closest to 0.5. 

3. Select the fractional variable (if available) that has maximum objective func- 
tion coefficient. 

4. If there are fractional variables that are equal to 1 in the currently best known 
feasible solution, select the one with maximum cost of them, otherwise, apply 
strategy [H 

5. Select a fractional variable (if available) that is closest to one, i.e., find a 
variable e* with Xe* = max{xe \ Xe < !}• 
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6. Select a set L of promising branching variable candidates. Let Ax < 6 be 
the constraint system of the last solved LP. Solve for each variable i & L the 
two Linear Programs 

Vq = maxjc^x | Ax <h, Xi = 0} 

v\ = maxjc^x | Ax <b, Xi = 1} 
and select the branching variable b with 

max{riQ,?;J} = minmax{r>Q, 

i^L 

Some running time can be saved if instead of the solution of the Linear Pro- 
grams to optimality only a restricted number of iterations of the simplex- 
method is performed. Then the objective function value might already in- 
dicate the “quality” of the branching variable, especially if a steepest edge 
pivot selection criterion is applied. 

Computational experiments for the strategies m and m-m applied to the 
TSP can be found in |4^. Other branching variable selection strategies can be 
found in [^. 

Instead of partitioning the set of feasible solutions by branching on a variable, 
it is also possible to use a hyperplane intersecting the polytope defined by the 
current subproblem. This alternative way of branching was proposed for the first 
time in |18| and used with a problem specific hyperplane for the TSP. 

Another modification of the branching process is branching on fc > 2 variables 
or hyperplanes. In this case we get a 2^-nary instead of a binary branch-and-cut 
tree. 



Select. If the list of active branch-and-cut nodes is empty, the best known 
feasible solution is the optimum solution. Otherwise a branch-and-cut node is 
selected and removed from the set of active branch-and-cut nodes and the pro- 
cessed. After a selection the set variables (including the branching variables) 
must be adjusted. If it turns out that some variable must be set to 0 or 1, yet 
has been fixed to the opposite value in the meantime, we have a contradiction. 
In this case the branch-and-cut node is fathomed. If the local lower bound 11b 
of the selected node exceeds the global upper bound gub, the node is fathomed 
immediately and the selection process is continued. 

Up to now we have not specified which node is selected from the set of 
active branch-and-cut nodes. There are three well-known enumeration strategies 
in branch-and-bound (branch-and-cut ) algorithms: depth-first search, breadth- 
first search and best-first search. We define the level of a branch-and-cut node B 
as the number of edges on the path from the root of the branch-and-cut tree to 
B. In the case of depth-first search a branch-and-cut node with maximum level 
in the branch-and-cut tree is selected from the set of active nodes in SELECT, 
whereas in breadth-first search a subproblem with minimum level is selected. In 
best-first search the “most promising” node becomes the current branch-and-cut 
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node. For a minimization problem the node with maximum local lower bound 
among all active nodes is often considered as most promising. 

Computational experiments for the TSP (see [33]) show that depth-first 
search is an enumeration strategy with the “risk” of spending a lot of time 
in a branch of the tree, which is useless for computing better upper and lower 
bounds. Often the local lower bound of the current subproblem exceeds the ob- 
jective function value of an optimum solution, however, this node cannot be 
fathomed, because no good upper bound is known. The same phenomenon oc- 
curs also sometimes when using breadth-first search, but it is very rare if the 
enumeration is performed in best-first search order. 



Fathom. If for a node the global upper bound gub does not exceed the local 
lower bound lib, or a contradiction occurred, or an infeasible branch-and-cut 
node has been generated, or if the LP-solution is a characteristic vector of a 
feasible solution, the current branch-and-cut node is deleted from further consid- 
eration. Even though a node is fathomed, the global upper bound gub may have 
changed during the last iteration, so that additional variables may be fixed by 
FIXBYREDCOST and FIXBYLOGIMP. The fathoming of nodes in FATHOM 
may lead to a new root of the branch-and-cut tree for the remaining active nodes. 



Output. The currently best known solution, which is either an optimum solu- 
tion or satisfies a desired guarantee requirement, is written to an output file. 



3.3 Computation of Local Lower Bounds 

The computation of local lower bounds consists of all elements of the leftmost 
dashed bounding box of Fig. El except EXPLOIT LP. In EXPLOIT LP the upper 
bounds are updated, if the solution of the Linear Program is the characteristic 
vector of a better feasible solution. Also improvement heuristics, using informa- 
tion from the LP-solutions, can be incorporated here as suggested in Section 13131 

For the computation of lower bounds LP-relaxations are solved iteratively, 
violated valid inequalities are added, and non-binding constraints are deleted 
from the constraint matrix. 

In this section we will also point out that an additional data structure for 
inequalities, called pool, is very useful, although not necessary for the correctness 
of the algorithm. For now, we can think of a pool just as a collection of constraints 
or variables. 

The active inequalities are the ones in the current LP and are both stored in 
the pool and in the constraint matrix, whereas the inactive constraints are only 
present in the pool. The pool is initially empty. If an inequality is generated by 
a separation algorithm, it is stored both in the pool and added to the constraint 
matrix. Further details of the pool are outlined in Section [5.31 
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Initialize New Node. If the current branch-and-cut node is the root node of 
the branch-and-cut tree the LP is initialized by some small constraint system. 
Often the upper and lower bounds on the variables are a sufficient choice (e.g., 
for the maximum cut problem) . For the TSP, the degree constraints for all nodes 
are normally added. Augmenting the initial system by other promising cutting 
planes can sometimes reduce the overall running time (see |35|b A primal feasible 
basis derived from a feasible solution can be used as a starting basis in order to 
avoid phase 1 of the simplex algorithm. 

Any set of valid (preferably facet defining) inequalities can be used to ini- 
tialize the first constraint system of subsequent subproblems. Yet, in order to 
guarantee monotonicity of the values of the local lower bounds in each branch 
of the enumeration tree, and to save running time, it is appropriate to initialize 
the constraint matrix by the constraints that were binding when the last LP of 
the father in the branch-and-cut tree was solved. 

Since the basis of the father is dual feasible for the initial LP of its sons, phase 
1 of the simplex method can be avoided by starting with this basis. The columns 
of non-basic set and fixed variables can be removed from the constraint matrix to 
keep the LP small. If their value is non zero, the right hand side of the constraint 
has to be adjusted, and the corresponding coefficients of the objective function 
must be added to the optimum value returned by the simplex algorithm in order 
to get the correct value of the variable Ipval. Set or fixed basic variables should 
not be deleted, because this would lead to a neither primal nor dual feasible basis 
and require phase 1 of the simplex method. The adjustment of these variables 
can be performed by adapting their upper and lower bounds. 



Solve LP. The LP is solved by the primal simplex method, if the basis is primal 
feasible (e.g., if variables have been added) or by the dual simplex method if 
the basis is dual feasible (e.g., if constraints have been added). The two-phase 
simplex method is required if the basis is neither primal nor dual feasible. This 
can happen if constraints necessary for the initialization of the first LP of a 
subproblem are not available since they had to be removed from the pool as we 
will describe in Section o 

The LP-solver is one of the bottlenecks of a branch-and-cut algorithm. 
Sometimes more than 90% of the computation time is spent in this proce- 
dure. Today, efficient implementations of the simplex method, like Cplex m 
or XPress [69] are competitive on solving Linear Programs from scratch. 
However, a branch-and-cut algorithm requires a LP-solver with very efficient 
post-optimization routines. 

The simplex method satisfies all the requirements of a branch-and-cut algo- 
rithm and it is used by nearly all implementations of cutting plane algorithms. 
Therefore we have outlined the algorithm in this section under the assumption 
that the simplex method is used. Since the LPs that have to be solved in a cutting 
plane algorithm, are often highly degenerate, good pivot variable selection strate- 
gies, like the steepest-edge pivot variable selection criterion, are necessary. These 
degeneracies might even require some preprocessing of the LPs (see, e.g., |35jb 
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Exploit LP. First, we have to check if the current LP-solution is the character- 
istic vector of a feasible solution. If this is the case we leave the bounding part 
and fathom the current branch-and-cut node. Otherwise, most implementations 
of branch-and-cut algorithms proceed with the cutting plane generation phase. 
However sometimes we can do better by exploiting the fractional LP-solutions 
to improve the upper bound before additional cutting planes are generated. We 
will discuss these ideas in Section 13.41 Before the separation phase is performed 
in SEPARATE, variables may be fixed or set as explained in FIX AND SET. 



Tailing Off. Often it is reasonable to abort the cutting plane part if no signif- 
icant increase of Ipval in the most recent LP-solutions has taken place. This 
phenomenon is called tailing-off (cf. |56]). If during the last k (e.g., k = 10) 
iterations in the bounding part, Ipval did not increase by more than p % (e.g., 
p = 0.01), new subproblems are created instead of generating further cutting 
planes. Good choices for the parameters p and k are both rather problem spe- 
cific and dependent on the quality of the available cutting plane generation 
procedures. 



Separate. The separation phase is the central part of a branch-and-cut al- 
gorithm. We try to find violated globally valid (preferably facet-defining) con- 
straints, which are added to the LP. We say an inequality is globally valid, if it 
is a valid inequality for every subproblem of the branch-and-cut algorithm. We 
call a constraint locally valid, if it is only a valid inequality of a subproblem S 
and all subproblems of the subtree rooted at S. 

It may not always be a good strategy to call any available separation al- 
gorithm in each iteration of the cutting plane generation. Experiments show 
that a hierarchy of separation routines is often preferable. Certain separation 
methods should only be performed, if others have failed. Before calling a time 
consuming exact separation algorithm, one should attempt to generate cutting 
planes with faster heuristic methods. However, this hierarchy is rather problem 
specific so that we cannot give a general recipe for its application. We refer to 
the publications on specific implementations. 

The constraint pool provides us with another cutting plane generation tech- 
nique. Inactive constraints that are violated by the current LP-solution can be 
regenerated from the pool. Of course this method requires an efficient algorithm 
to perform this test and to transform the storage format of the constraint used 
in the pool into the storage format for the LP-solver. The pool-separation can 
be advantageous for classes of inequalities for which only heuristic separation 
routines are available. In this case it can happen that a constraint of this class 
is violated, yet cannot be identified by the heuristic. However, this cutting plane 
might have been generated earlier in the computational process (at a different LP 
solution which has been more “convenient” to our heuristic). If this constraint 
is still contained in the pool, it can be reactivated now. 

It can also happen that the pool-separation for a class of constraints is more 
efficient than a direct separation by a time consuming heuristic or an exact algo- 
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rithm. Therefore, the pool-separation should always be performed before calling 
these algorithms. However, for other classes of constraints it can sometimes be 
observed that the pool-separation is very slow in comparison to direct separation 
methods. 

Since the pool can become very large during the computational process, it is 
necessary to limit the search in the pool for violated inequalities. For instance, 
the pool-separation can be restricted to some classes of constraints. Therefore 
the pool-separation should be carefully included into the hierarchy of separation 
algorithms and it requires many computational tests to find a strategy that is 
efficient for a specific combinatorial optimization problem. 

For some combinatorial problems like the maximum cut problem, often sev- 
eral hundred violated inequalities can be generated. However, it would be suffi- 
cient to add those constraints to the LP that will be binding after the solution 
of the next LP. Unfortunately we do not know this subset of the generated in- 
equalities. On the other hand, adding all the constraints to the matrix of the 
LP-solver can slow down the overall computation time. Therefore, depending on 
the performance of the LP-solver, only a limited number of constraints should be 
added to the LP. A straightforward approach is just stopping the cut generation 
when this limit is reached. A more sophisticated method might be generating 
as many constraints as possible, and afterwards selecting the “best” of them. 
A simple classification criterion is the degree of violation given by the value of 
the corresponding slack. For the TSP, Padberg and Rinaldi propose as a 
measure the distance of the LP-solution from the projection of the cut into the 
affine space defined by the degree equations. The larger this distance the better 
the cut, yet, this method is computationally expensive. Other quality measures 
for cutting planes like the angle of the cut defining hyperplane to the objective 
function vector have been investigated in US]. 

The representation of the inequality for the LP-solver can have significant 
influence on its running time. For instance, equations of the integer programming 
formulation can be added to any valid inequality without changing the half- 
space which it defines. However, the number of the non-zero coefficients in the 
inequality may differ. Normally, LP-solvers are more efficient if the number of 
non-zeros in the constraint matrix is small. 

The solution of the separation problem is very problem specific. Therefore 
we only want to present an example for the TSP. 

A polynomial time algorithm for the solution of the exact separation problem 
of subtour elimination constraints can be directly derived from their definition 
in Section 0 If the value of the minimum weight cut in the support graph (the 
graph with the LP-solution as edge weights) is greater than or equal to 2, the 
current LP-solution does not violate any subtour elimination constraint. Each 
cut with a value less than 2 induces a violated subtour elimination constraint. 

So the separation problem for subtour elimination constraints reduces to a 
minimum capacity cut problem for which the practically most efficient solu- 
tion was given in Padberg and Rinaldi m and refined in Jiinger, Rinaldi, and 
Thienel 
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Eliminate. If inequalities are added to the constraint matrix of the LP-solver, 
and no inequalities are eliminated, soon the size of the matrix might become too 
large to solve the linear programs in reasonable time and even the storage of the 
constraints in the matrix would require too much memory. Moreover, there are 
inequalities that become redundant for the rest of the computation. Therefore a 
strategy is required to maintain a reasonable sized matrix, yet not to eliminate 
important inequalities. 

It is an obvious and simple strategy for the elimination of constraints to 
delete all active inequalities that are non-binding in the last LP-solution from 
the constraint structure before the LP is solved after a successful cutting plane 
generation phase. To avoid cycling, i.e., a constraint is eliminated, but already 
violated after the next LP-solution, either constraints should be only removed if 
the value of the slack s is big enough (e.g., s > 0.001), or if they are non-binding 
during several successive LP-solutions. 

3.4 Computation of Global Upper Bounds 

For most combinatorial optimization problems a host of heuristics is available 
to compute feasible solutions that provide global upper bounds for the branch- 
and-cut algorithm. Traditionally the computation of a global upper bound is 
performed in the procedure INITIALIZE before the cutting plane generation 
and enumeration phase starts. Later better lower bounds are only found if the 
LP-solution is the characteristic vector of a feasible solution. However, it can be 
observed that this happens rather seldomly. Therefore sophisticated heuristics 
must be applied in INITIALIZE to generate a good lower bound. Otherwise, the 
enumeration tree may grow too large. 

In | 43| a dynamic strategy, integrated in the cutting plane generation part, for 
the computation of lower bounds is presented, which we briefly outline. It turns 
out that the fractional LP-solutions occurring in the lower bound computations 
in a branch-and-cut algorithm give hints on the structure of optimum or near 
optimum feasible solutions. 

The basic requirement for the upper bound computations is efficiency in 
order not to inhibit the optimization process. While in the first stages high 
emphasis is laid on providing good feasible solutions, this emphasis is less in the 
later stages of the computational process. On the other hand, computing upper 
bounds can always be reasonable since new knowledge about the structure of 
optimum feasible solutions is acquired (e.g., because of fixed and set variables). 

Exploiting the LP-Solution. Integer optimum solutions, i.e., characteristic 
vectors of feasible solutions, will almost never result from the LPs occurring 
in the branch-and-cut algorithm. But, it can be observed that these solutions, 
although having many fractional components, give information on good feasible 
solutions. They have a certain number of variables equal to 1 or 0 and also a 
certain number of variables whose values are close to 1 or 0. This effect can 
be exploited to form a starting feasible solution for subsequent improvement 
heuristics. 
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We show how the information of the LP-values of the variables can be used 
for the construction of a feasible solution for the TSP. We use the terms edge and 
variable of the integer programming formulation interchangeably, since they are 
in a one-to-one correspondence in our examples. First, we check if the current LP- 
solution is the characteristic vector of a tour. If this is the case we terminate the 
procedure EXPLOIT LP. Otherwise, edges are sorted according to their values 
in the current LP-solution. We give decreasing priorities to edges as follows: 

— edges that are fixed or set to 1, 

— edges equal to 1 or close to 1 in the current LP, 

— edges occurring in several successive LPs. 

This list is scanned and edges become part of the tour if they do not produce a 
subtour with the edges selected so far. This gives a system of paths and isolated 
nodes which now have to be connected. To this end a savings heuristic of Clarke 
and Wright m, originally developed for vehicle routing problems, can be used, 
since the TSP can be considered as a special vehicle routing problem involving 
only one vehicle. The previous step gives us a feasible solution that can be 
improved by local improvement heuristics like Lin-Kernighan [48j . 

This heuristic basically consists of successively merging partial tours to obtain 
a Hamiltonian tour. We select one node as base node and form partial tours by 
connecting this base node to the end nodes of each of the paths obtained in 
the selection step and also adding a pair of edges to nodes not contained in any 
path. Then, as long as more than one subtour is left, we compute for every pair 
of subtours the savings that is achieved if the subtours are merged by deleting in 
each subtour an edge to the base node and connecting the two open ends. The 
two subtours giving the largest savings are merged. Edges that are fixed or set 
to 0 should be avoided for connecting paths. 

3.5 Sparse Solution and Column Generation 

Often combinatorial optimization problems involve a very large number of vari- 
ables, yet a feasible solution is comparatively sparse. For instance, the TSP on a 
complete graph of n nodes has ( 2 ) variables. Yet, a tour consists only of n edges. 
Hence, the computational process can be accelerated, if a suitable subset of the 
edges is initially selected and appropriately augmented during the solution of the 
problem, if this is required for the correctness of the algorithm. However, sparse 
graph techniques can not be applied to problems with a dense solution structure 
like the maximum cut problem (see Section EH). Sparse graph techniques for 
the TSP have been introduced by Grotschel and Holland m- 

We present techniques exploiting the sparsity of solutions only for combina- 
torial optimization problems defined on graphs. However this technique can be 
generalized for other problems, if the structure of the solutions is sparse, suitable 
subsets of the variables can be computed efficiently, and a method to generate 
the columns of non-active variables is available. 

In order to integrate this technique into the basic algorithm in Fig. de- 
scribed so far, we have to deal with LP problems where not only rows but also 
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columns of the constraint matrix are dynamically created. The resulting more 
general algorithm that in |56j was called branch- and- cut is described in the 
flowchart shown in Fig. |9] With respect to the basic algorithm, the gray boxes in 
the flowchart have to be added or changed. A subproblem, in which an infeasible 
LP is detected, cannot be fathomed at once, but rather it must be checked if the 
addition of non-active variables can regenerate the feasibility. We explain this 
process in ADDVARIABLES. Before leaving the bounding part, it has to be 
verified in PRICE OUT, if the LP-solution computed on the sparse graph is also 
optimum on the complete graph. Only in this case the variable Ipval becomes a 
local lower bound 11b. The application of the routine FIX AND SET has to be 
performed now more carefully. The procedure SETBYREDCOST can only be 
applied after an additional pricing step, in which no variable has to be added. 
This is also the case for FIXBYREDCOST if the root node of the remaining 
branch-and-cut tree is currently processed. 



Suitable Sparse Graphs. The initial sparse graph is generated in the proce- 
dure INITIALIZE. For the TSP, a good choice for a sparse graph is the fc-nearest 
neighbor graph. Another suitable subset of the edges may be the Delaunay graph 
(see also and [H]). Figures [TOlITTj andfT^show an optimum tour through 127 
beer gardens of Augsburg (Germany) together with the 5-nearest neighbor graph 
and the Delaunay triangulation. If it cannot be guaranteed that the sparse graph 
contains a feasible solution, it should be augmented by the edges of a solution 
computed by a heuristic. Padberg and Rinaldi m suggest to create a series of 
feasible solutions heuristically and initialize the sparse graph with all involved 
edges. 

In addition to the sparse graph, the edges of the “reserve graph” can be 
computed. These edges are additional “promising” edges that do not belong to 
the sparse graph. For instance, if the sparse graph is the 5-nearest neighbor 
graph, a suitable reserve graph is given by the edges that have to be added to 
get the 10-nearest neighbor graph. The reserve graph can be used in PRICE 
OUT and ADDVARIABLES. 

The algorithm starts working on G, adding and deleting edges (variables) 
dynamically during the optimization process. We refer to the edges in G as 
active edges and to the other edges as non-active edges. 



Add Variables. Variables have to be added to the sparse graph if indicated by 
the reduced costs (handled by PRICE OUT) or if the current LP is infeasible. 
The latter may be caused by two reasons. 

First, some active inequality has a void left hand side, since all involved vari- 
ables are fixed or set and removed from the LP, but is violated. If all coefficients 
of non-active variables in this inequality are nonnegative, it is clear from our 
strategy for variable fixings and settings that the branch-and-cut node is fath- 
omed (all constraints are assumed to be of the form a'^x < bi). However, if there 
is a non-active variable with a negative coefficient, this variable may remove the 
violation. So it is added to the LP. 
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Fig. 9. Flowchart of a branch-and-cut algorithm. 
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Fig. 10. Shortest tour through 127 beer gardens in Augsburg (Germany). 







Fig. 11. The 5-nearest neighbor graph for the beer gardens in Augsburg contains all 
but 3 edges of the best tour. 
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Fig. 12. In the Delaunay graph for the beer gardens in Augsburg only 2 edges of the 
best tour are missing. 



Second, the above condition does not apply, and the infeasibility is detected 
by the LP-solver. In this case a pricing step is performed in order to find out 
if the dual feasible LP-solution is dual feasible for the entire problem. Variables 
that are not in the current sparse graph (i.e., are assumed to be at their lower 
bound 0) and have negative reduced cost are added to the current sparse graph. 
An efficient way of computing the reduced costs is outlined in PRICE OUT. 

If new variables have been added, then the new LP is solved. Otherwise, 
by a more elaborated method we try to add new variables to the LP in order 
to make it feasible. The LP-value Ipval, which is the objective function value 
corresponding to the dual feasible basis where primal infeasibility is detected, is 
a lower bound for the objective function value obtainable in the current branch- 
and-cut node. So if Ipval > gub, the branch-and-cut node can be fathomed. 

Otherwise, we first mark all infeasible variables, i.e., all those that violate 
the lower or the upper bound and all the negative slack variables. 

Let e be a non-active variable and Ve be the reduced cost of e. An edge e 
is taken as a candidate only if Ipval + re < gub. Let B be the basis matrix 
corresponding to the dual feasible LP-solution, at which the primal infeasibility 
was detected. For each candidate e let Ug be the column of the constraint matrix 
corresponding to e and solve the system Bcie = Og. Let ag(6) be the component of 
Og corresponding to basic variable Xb- Increasing Xg “reduces some infeasibility” 
if one of the following holds: 
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— Xb is an infeasible structural variable (i.e., corresponding to an edge of G) 
and 

{xb < 0 and ae{b) < 0) or (xb > 1 and ae{b) > 0) 

— Xb is a negative slack variable and 

a,e{b) < 0. 

In such a case the variable e is added to the set of active variables and the 
marks are removed from all infeasible variables whose infeasibility can be reduced 
by increasing Xg. This can be done in the same hierarchical fashion as described 
below in PRICE OUT. 

If variables can be added, the new LP is solved, otherwise the branch-and- 
cut node is fathomed. Note that all systems of linear equations that have to 
be solved have the same matrix B, and only the right hand side Oe changes. 
This can be utilized by computing a factorization of B only once, in fact, the 
factorization can be obtained from the LP-solver for free. For further details on 
this algorithm, see m- 



Price Out. Pricing is necessary before a branch-and-cut node can be fathomed. 
Its purpose is to check if the LP-solution computed on the sparse graph is valid 
for the complete graph, i.e., all non-active variables “price out” correctly. If this 
is not the case, non-active variables with negative reduced cost are added to the 
sparse graph and the new LP is solved using the primal simplex method starting 
with the previous (now primal feasible) basis, otherwise the local lower bound 
11b and possibly the global lower bound gib can be updated. 

Although the correctness of the algorithm does not require this, additional 
pricing steps can be performed every k (e.g., k = 10) solved LPs (see [KH]). 
The effect is that non-active variables which are required in a good or optimum 
feasible solution tend to be added to the sparse graph early in the computational 
process. If no variables are added, it can also be tried to fix or set variables by 
reduced cost criteria. 

Let y be the vector of the dual variables, and Af, the column of an inactive 
variable e in the matrix A defined by the active constraints, and Cg the corre- 
sponding objective function coefficient, then the reduced costs of the variable e 
are given by Ve = Cg - y'^Ag. 

The computation of the reduced cost for all inactive edges takes a great 
computational effort, but it can be performed significantly faster by an idea of 
Padberg and Rinaldi m- If our current branch-and-cut node is the root of the 
remaining branch-and-cut tree, it can be checked if the reduced cost Xg of a 
non-active variable e satisfies the relation Ipval + Tg > gub. In this case this 
non-active edge can be discarded forever. During the systematic enumeration 
of all edges of the complete graph, an explicit list of those edges which remain 
possible candidates can be made. In the early steps of the computation, too many 
such edges remain, so that this list cannot be completely stored with reasonable 
memory consumption. Instead, a partial list is stored in a fixed size buffer and 
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the point where the systematic enumeration has to be resumed after considering 
the edges in the buffer is memorized. In later steps of the computation there is a 
good chance that the complete list fits into the buffer, so that later calls of the 
pricing routine become much faster than early ones. 

In |43| a further modification of the procedure PRICE OUT is presented. It 
can be observed that the reduced costs of edges not belonging to the reserve 
graph are seldomly positive, if the reserve graph is appropriately chosen. Hence, 
it turns out that a hierarchical approach is advantageous. Only if the “partial 
pricing” considering the edges of the reserve graph has not added variables, the 
reduced cost of all non-active variables have to be checked. 



Computation of Global Upper Bounds. Sparse graph techniques can also 
be used for the computation of feasible solutions both if heuristics are applied 
only during the initialization phase or if they are integrated in the cutting plane 
generation phase. 

A candidate subgraph is a subgraph of the complete graph on n nodes con- 
taining reasonable edges in the sense that they are “likely” to be contained in a 
good feasible solution. These edges are taken with priority in the various heuris- 
tics, thus avoiding the consideration of the majority of edges, which are assumed 
to be of no importance. Various candidate subgraphs and the question of how 
to compute them efficiently are discussed in m- 

The candidate subgraph can be related to the set of active variables in the 
linear programming problems, if the heuristics are integrated into the cutting 
plane generation part as described before. Basically, the candidate subgraph is 
initialized with some graph (e.g., the empty graph) and then edges are added 
whose corresponding values are close to one. In order to avoid too extensive 
growing of the candidate subgraph and to avoid being biased by LPs that were 
not recently solved, the candidate subgraph should be cleared in certain intervals 
(e.g., every 20th cutting plane phase) and reinitialized. 

It should be noted that the feasible solution found by the heuristics should 
not be restricted to only using edges of the sparse graph. These edges are only 
considered with priority and lead to an acceptable CPU time. Usually, heuristics 
will introduce edges that are not active in the LP. These edges are added to the 
set of active variables. This is based on the assumption that these edges are also 
important for the upper bound computations and would be added to the LP 
in some pricing step anyway. This way the set of active variables is augmented 
without pricing. 



4 Some More Advanced Topics 

The success of a branch-and-cut algorithm relies, to a great extent, on a careful 
design of all its components. This is true, in particular, when one wants to treat 
instances of very large size. Since in this chapter we are dealing with combinato- 
rial optimization problems, we show here how the structure of these problems can 
sometimes be used to handle instances of large size by branch-and-cut. Due to 
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space limitations we only address two issues. In the first we describe a reduction 
technique that is used in the separation problem for TSP and the maximum cut 
problem. In the second we deal with problems where the LP relaxation used for 
the branch-and-cut algorithm has exponentially many inequalities and variables. 

Before we start, let us compile a few notions in polyhedral combinatorics. A 
polytope P C M” can either be characterized as the convex hull of a finite set 
/ C M”, i.e., 

P = conv(J), 

or as the solution set of a finite system of inequalities Ax < a and equations 
Bx = b, i.e., 

P = {x G IR” I Ax < a, Bx = b} . 

As we have seen, the first characterization, also called inner description, arises 
naturally when the points in I are the characteristic vectors of certain combina- 
torial objects over which we want to optimize a linear objective function. The 
second characterization is also called an outer description. Let us assume that 
Bx = 6 is a minimal equation system, i.e., the matrix B has full row rank m 
and there are no hidden equations in the system Ax < a. Then the polytope P 
has dimension dim(P) = n — m. (For a full dimensional poly tope the equation 
system Bx = b is void.) 

The inequality a^x < oq is valid for P if P C {a; G IR" | a^x < oq}. For any 
valid inequality oFx < ag for P, the polytope {x G P | a^x = oq} is called a, face 
of P. Faces of P different from P itself are called proper faces of P. A proper 
face of minimum dimension 0 is called a vertex of P. If P arises as the convex 
hull of the characteristic vectors of certain combinatorial objects, the vertices 
of P are exactly the integral (0/1 valued) characteristic vectors. A polytope in 
which all vertices have only integral components is called an integral polytope. A 
proper face of maximum dimension dim(P) — 1 is called a facet of P. The most 
compact outer description of P consists of a minimal equation system Bx = b 
and an inequality system Ax < a that consists only of facet defining inequalities, 
one for each facet of P. 

A special case of a celebrated result of Grotschel, Lovasz, and Schrijver 
says that we can optimize a linear objective function over a polytope P in polyno- 
mial time if and only if we can solve the separation problem for P in polynomial 
time. 

4.1 Reduction and Lifting for Separation 

In the previous section, we have used the TSP as an illustrating example. We 
have utilized the fact that the degree equations and the subtour elimination con- 
straints constitute an integer linear programming formulation of the TSP. We 
have also discussed the use of sparse graph techniques that are crucial for a suc- 
cessful branch-and-cut approach to the TSP. The efficient solution of the separa- 
tion problem for the subtour elimination constraints via a minimum capacity cut 
calculation has also been mentioned. However, the striking success of the branch- 
and-cut approach to the exact solution of large TSP instances relies on the much 
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more detailed knowledge of the facial structure of the TSP polytope P-flgp and 
practically efficient heuristics for the separation of inequalities other than the 
subtour elimination constraints. Over the years, many classes of facet defining, 
or simply valid, inequalities for P^gp have been identified, see Naddef for 
a recent survey. The state of the art in TSP solving largely depends on how 
much of this knowledge can be exploited by efficient separation algorithms. In 
contrast to a very successful new “projection cut paradigm” for TSP-separation 
[ — Applegate/Bixby/Chvatal/Cook], the traditional approach of attempting 
to separate a priori known inequalities is called the “template paradigm” . We 
have not the space to describe various “template” classes of (facet defining) valid 
inequalities for the TSP, but we would like to point out how reduction and lifting 
helps in separation, regardless of which paradigm is used. We sketch the situa- 
tion for the TSP (and refer to [ — Applegate/Bixby/Chvatal/Cook] for details 
on the “project and lift” paradigm) and then discuss the case of the maximum 
cut problem in more detail. 

Let P^uBTOUR solution set of ([2|) without the integrality constraints, 

^^UBTOUR called the subtour polytope and clearly 

^SUBTOUR ^ Ppsp = conv (PsUBTOUR I”' 1}^") . 

By the polynomial equivalence of separation and optimization m we can op- 
timize over P^uBTOUR polynomial time, and a pure cutting plane algorithm 
can provide an x S P^ubtour efficiently. With such a point we associate its 
support graph, which is a weighted graph with n nodes whose edges correspond 
to nonzero components of x; each edge has a weight given by the value of the 
associated component of x. In a branch-and-cut algorithm for the TSP it is 
therefore reasonable to make sure that the point x to be separated by further 
inequalities is contained in the subtour polytope, i.e., for any 0 7 ^ W 5 P we 
have that x{S{W)) := X)eG 5 (w) > 2. While, as mentioned in Section IT3] the 
exact separation of the subtour elimination inequalities can be solved very effi- 
ciently even for points x whose support graph has several thousand nodes and 
edges, for these graphs it would be next to impossible to solve the separation 
problem for other classes of inequalities in a reasonable amount of time. 

A set S' C P of at least two nodes is called tight if x{6{S)) = 2. Typically 
the support graph of x has several tight sets. If we contract a tight set, i.e., if 
we identify all its nodes into a single node, remove the loops generated by this 
process, and replace the resulting parallel edges by a single edge with weight 
equal to the sum of the weights of the removed parallel edges, we obtain a 
smaller graph. It is not difficult to see that this graph is the support graph of a 
point x' belonging to -P^ubtour> where n' < n. 

These observations suggest the following procedure: 

1. Contract some selected tight sets and produce a new point x' G ^’^ubtour- 

2. Find inequalities valid for violated by x' . 

3. Extend such inequalities to inequalities valid for B|lgp and violated by the 
original point x. 
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Unfortunately, it may happen that after contracting some tight sets, the new 
point a belongs to expressed as a convex combination of 

the characteristic vectors of a set of Hamiltonian cycles. Therefore, no valid 
inequality for P-^sp violated by x' in this case. In Padberg and 

Rinaldi show how this happens and give sufficient conditions for a tight set to 
be contracted without producing such undesired situations. 

Step 3 of the above procedure can be effectively solved using a lifting theorem 
of Naddef and Rinaldi m that says that, under conditions that are satisfied by 
all facet defining inequalities for the TSP polytope known to date, a facet defining 
inequality for violated by x' by an amount v can be easily lifted to a facet 
defining inequality for P^sp violated by x by the same amount v. 

Successful approaches to TSP separation differ mainly in how the separation 
for Pxsp is performed, e.g., Padberg/Rinaldi [SS|, Naddef/Thienel |5()l51j . and 
Christof/Reinelt [T^ make exclusive use of the template paradigm (the latter 
makes special use of a library of all facets for P^gp {n < 10)), [ — >■ Apple- 
gate/Bixby/Chvatal/Cook] go beyond. Naddef is the latest survey of all 
state of the art techniques in exact TSP solving by branch-and-cut. 

In Section [T] we have transformed the break minimization problem to the 
maximum cut problem in a special sparse graph G = (V,E) with edge weights. 
Exact solutions to maximum cut problems in sparse graphs play an important 
role in statistical physics, since the determination of a minimum energy state 
of a spin glass reduces to a maximum cut problem in a sparse graph, see, e.g., 
De Simone et al. for details. With the branch-and-cut algorithm described 
there, maximum cut instances on toroidal grids of size up to 100 x 100 (see 
Fig. [Ml) could be routinely solved to optimality. 

Analogously to we can give an integer linear programming formulation 
for the maximum cut problem: 

maximize ^ CgXg 

eeE 

subject to Xe < \F\ — 1 for all F C C, |P| odd, 

eeF e^c\F g^ch cycle C of G 

0 < Xe < 1 for all e € P 

Xe G {0, 1} for all e € P 

The nontrivial inequalities are called cycle inequalities. Therefore, we call the 
polytope Pctcle defined by the cycle inequalities and the trivial inequalities 
the cycle polytope. As usual, the cut polytope P^jt i® convex hull of the 
characteristic vectors of all cuts and we have 

^CTCLE U PcUT = conv (PctCLE {0, 1}'®) 

A cycle inequality defines a facet of P^yx when the defining cycle C is chord- 
less. A nontrivial inequality Xg > 0 or Se < 1 defines a facet of P^yx when e 
is not contained in a triangle of G. All non-facet defining inequalities can be 
eliminated from the linear description of Pctcle Therefore, when G 
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Fig. 13. 100 X 100 grid on a torus, 
is the complete graph Kp, only the inequalities 

Xij “t“ ^ik ^jk ^ 2 
Xij ^ik ^jk ^ 0 

^ij “t“ ^ik ^jk — 0 

^ik “t“ ^jk — 0 



remain. 

The polytope been extensively studied and a number of families of 

valid inequalities, several of them facet defining, have been described (see, e.g., 
the surveys m and m)- For some of these inequalities separation procedures 
have been proposed (see, e.g., m, III]. and HD- 

All these results concern the maximum cut problem on complete graphs. But 
how can they be used when the graph is not complete? A trivial way to exploit 
the current knowledge about F^jt arbitrary graph G is to 

add the missing edges to G in order to obtain an artificial complete graph, and 
to assign a zero weight to them. Such a technique has been successfully used 
for other combinatorial problems, where the sparsity of the original graph can 
actually be exploited to handle the artificial complete graph efficiently. This is 
the case, for example, in the TSP, where even if the original graph is not sparse, 
all the computations are carried out on a very sparse subgraph. To do so, it is 
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assumed that the edges of a suitable large subset have no intersection with an 
optimum solution and thus their corresponding variables are permanently set 
to zero. A proof that this assumption is correct is eventually provided by the 
solution algorithm. On the contrary, for the maximum cut problem there is no 
obvious way to exploit the sparsity of the original problem or to use a small 
working subset of the original edges. This means that if one uses the above 
technique of completing the graph with edges of zero weight, the exact solution 
of the maximum cut problem on sparse graphs has the same computational 
difficulties as on complete graphs. 

Unfortunately, interesting applications of the maximum cut problem, like 
the study of minimum energy configurations in spin glasses, require the exact 
solution of instances with several thousand nodes. Therefore, the solution of 
these instances is out of reach, unless the problem is solved in the original sparse 
graph. 

Why is there such a difference between the TSP and the maximum cut prob- 
lem, as far as the exploitation of the descriptions of their polyhedra is concerned? 
We try to make this point more clear by making some simple observations. 

Let T~L and TL' be the sets of all the Hamiltonian cycles of the graphs G and 
G\ e, respectively, where G \ e is obtained from G by removing edge e. Let P 
and P' be the convex hulls of the characteristic vectors of all the elements of TL 
and TL' , respectively. Clearly, TL' is a subset of TL and is made of all the elements 
of TL that do not contain edge e. Correspondingly, P' is the intersection of P 
with the hyperplane {x \ Xe = 0}, thus P' is a face of P. As a consequence, any 
valid inequality for P is also valid for P' and, by combining it with the equation 
Xe = 0, valid for P', it can be turned into an equivalent inequality obtained by 
the original one by dropping the term in Xe- Thus, a linear description of P' is 
readily obtained from a linear description of P. 

Let now /C and 1C be the sets of all cuts of G and G \ e, respectively, and 
P§VT PcvT corresponding polytopes. All the elements of K, that do not 
contain edge e are in fC . Moreover, if K G K. and e G K then K \ {e} G JC . 
Therefore, the characteristic vectors of elements of 1C are projections of charac- 
teristic vectors of elements of 1C onto the subspace {x \ Xe = 0}. Consequently, 
Pq^^ is the projection of onto that subspace. If we have a system of linear 

inequalities for Tctti or for one of its relaxations that can be described in a 
compact combinatorial way, we would like to have a similar compact description 
also for the linear system of its projection. However, the linear system of the 
projection, that can be obtained via a Fourier-Motzkin procedure [ — > Balas], 
can get extremely complex and it is very unlikely that a general criterion can 
be devised to describe it in a compact way. The only lucky (and non-trivial) 
case we are aware of is when the relaxation of Pqu^ is the cycle polytope of the 
graph G described by the system ©• Its projection onto {x \ Xe = 0}, as proved 
by Barahona |^, is again the cycle polytope of graph G \ e. 

After these observations it seems that the only reasonable way to proceed in 
order to have combinatorial descriptions of linear systems of relaxations of 
is to consider graphs with a special structure. Unfortunately, after the publication 
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of the paper by Barahona and Mahjoub [TO], where the study of a description 
of -Pctjt t>y linear inequalities was initiated and where the inequalities © were 
introduced, very little effort was devoted to the study of arbitrary 

graphs. Nevertheless, the optimum solution of spin glass instances on toroidal 
grid graphs with size up to 22 500 is reported in [25]. The algorithm used to 
obtain these results was a branch-and-cut algorithm based only on the cycle 
inequalities ([5l), for which “ad hoc” very effective separation procedures were 
designed. 

Unfortunately, the cycle inequalities are far from being sufficient to solve spin 
glass instances on more complex graphs. Some work in progress 1401 concerns 
new projection and lifting approaches to the separation for Tctt- We now give 
a brief outline and refer to the publication for the details that are much more 
complicated than in the case of the TSP. 

We are given a point S that satisfies all the inequalities ® but does 
not belong to Pqu'y, G being an arbitrary graph with n nodes. We want to find 
an inequality valid for P^ut (possibly facet defining) that is not satisfied by Xn ■ 
To do so, we want to use the algorithmic and the structural results that are 
available for the cut polytope for a complete graph. 

First, by a sequence of operations on Xn that amount to contracting 2-node 
sets of its support graph corresponding to the endnodes of integral weighted 
edges, such a point is transformed to Xn' € IR'® , where n' is usually much 
smaller than n. Differently from the corresponding TSP case, the point Xn' is 
always guaranteed to be outside P^ut but to satisfy (0. It can be seen as 
a fractional solution of a cutting plane algorithm applied to a maximum cut 
instance on a smaller and denser graph G' = (U', P') where \V'\ = n' . 

At this point all the machinery available for the maximum cut problem on 
complete graphs can be used for each complete subgraph Kp of G". Therefore, 
some separation procedures for the cut polytope on complete graphs are applied 
to the restriction of Xn' to the edges of these components that (hopefully) gen- 
erate an inequality an'Xn' > a, valid forP^yrp and violated by Xn' by an amount 

V. 

Finally, a sequence of lifting procedures is applied to Un'Xn' > a that trans- 
forms it to an inequality a„Xn > (3 valid for P^jt violated by Xn by the 
same amount v. 

As a by-product, one of these lifting procedures provides a simple way to 
generate facet defining inequalities for P(?ut- Namely, under certain conditions, 
this procedure, applied to a facet defining inequality for Pq^j^, produces not 
only a valid, but also a facet defining inequality for P^ut- 

In conclusion, these separation and lifting procedures enrich the description 
by linear inequalities of the cut polytope on arbitrary graphs and, at the same 
time, constitute an algorithmic tool for exactly solving the maximum cut prob- 
lem on these graphs. 
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4.2 Branch-and-Cut-and-Price 



Suppose we are given a combinatorial optimization problem V defined over a 
ground set E. A feasible solution for P is a suitable subset F of E. Let us denote 
by T the set of all feasible solutions for V . A weight We is given for each element 
e oi E and the problem consists of finding an element F £ T oi maximum total 
weight EeGF Re- 
consider the polytope defined as the convex hull of the characteristic vectors 
of all the feasible solutions F G E. This polytope can be described by a system 
of linear inequalities in the space 



Dx < g 

0 < X < 1 



(6) 



that typically has exponentially many rows. Let us assume that we have two 
alternative ways to solve V'. we have at hand a separation procedure for the 
associated polytope, thus we can solve it by a polyhedral cutting plane algorithm, 
but V can also be solved by a combinatorial algorithm. For example, V could be 
the weighted matching problem, in this case E would be the edge set of a given 
weighted graph and T the set of all its matchings. We know that this problem 
can be solved by a polyhedral cutting plane algorithm using the Padberg-Rao 
algorithm to generate violated blossom inequalities, but we can also solve the 
problem with Edmonds’ blossom algorithm, which is of combinatorial type. 

Let us consider now a different situation where not all elements of the set 
E are feasible, but only those whose characteristic vectors satisfy a set of linear 
inequalities Ax < b (we assume that the cardinality of this set is bounded by a 
polynomial in |i?|). In other words, we want to find an integral optimum solution 
of the system 

Ax < b 

Dx<g (7) 

0 < X < 1. 

The system © does not define, in general, an integral polytope as the sys- 
tem m does; moreover, the corresponding optimization problem V' is typically 
more difficult to solve than V. Therefore, we may need to use branch-and-cut 
techniques to solve it. First of all, let us see how we can solve the linear pro- 
gramming relaxation defined by the system ©• There are two possible ways. 
One is to use a polyhedral cutting plane algorithm by generating a subset of 
the inequalities of as we assumed that we know how to generate violated 
inequalities from this system. The second possibility is to use the decomposition 
of Dantzig- Wolfe 122! in the following way. 

We know that any point x of the polytope described by © is a convex 
combination of the characteristic vectors of some elements of E . Denoting by 
the characteristic vector oi F ^ E , we have that 

VfX^ ■, with '^yp = l and y > 0. 
fgf fgf 



X = 



(8) 
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By substituting (|8]) into 0 we obtain a new system of inequalities 

Ay < b 

l^y=l (9) 

y>o, 

where A = \Af]f^j^ = \Ax^]f£J^ and 11 is a vector of all ones. Optimizing a 
linear function x over the system (| 7 )) is equivalent to optimizing the linear 
function vpy over the system J2D, where w = = [uP We 

have thus replaced a linear program with \E\ variables and exponentially many 
constraints with another linear program with a number of constraints that is 
polynomial in \E\ and with exponentially many variables. 

We call the system 0 the ground set formulation and the system ® the 
subset formulation of problem V' . 

The maximization for the subset formulation can be solved with the following 
column generation algorithm. We select a “small” subset X of E and we solve 
the following linear program, where the subscript X denotes the restriction of 
the vectors and of the constraint matrix to the subset of variables associated 
with X, 



maximize wx 
subject to Axyx < b 
^yx = 1 
yx > 0 . 



(10) 



Let y be the extension of the optimum solution of (uni to the whole set T ob- 
tained by adding zero valued components. Let u and v be the dual optimum 
solutions associated with the first set of constraints and with the equation, re- 
spectively. 

In order to prove that y is the vector that maximizes w over the set defined 
by we have to compute the maximum among the reduced costs of all its 
components, by solving the following column generation subproblem: 



maximize wf — Af — v 

= wx^ — u'^Ax^ — V 
= {w — u^A)x^ — V 
subject to E G E 



( 11 ) 



If the optimum value of (HU is less than or equal to zero, we have proved the 
optimality of y~ otherwise, let the maximum be attained at the solution F. Then 
we add F to the set X, we extend the matrix Ax, and the vector wx with 
the column Ax^ and the component wx^ , respectively, and we solve the linear 
program m again. 

Observe that the column generation subproblem is precisely the original opti- 
mization problem V (a weighted matching problem in our example) with w—u^ A 
as the vector of weights. Since we have assumed up front that an (efficient) algo- 
rithm for V is available, can be solved with a linear optimizer and a sequence 
of calls to this algorithm. 



Branch-and-Cut Algorithms for Combinatorial Optimization 



197 



The column generation algorithm is typically preferable to a cutting plane al- 
gorithm applied to the system 0 ) when the optimization for V is simpler or more 
effectively solvable than the corresponding separation problem. Even though op- 
timization and separation are polynomially equivalent, it is usually more effective 
to use a polynomial time combinatorial algorithm rather than a cutting plane 
algorithm. This is, for example, true for the weighted matching problem where 
state of the art combinatorial algorithms (see 0) perform much better than 
cutting plane algorithms based on the Padberg-Rao separation procedure for 
the blossom inequalities. Going to NP-hard problems, it is known that certain 
instances of the binary knapsack problem can be solved effectively by dynamic 
programming or by some simple enumerative scheme, while is it quite unlikely 
that the same instances can be solved equally fast by polyhedral cutting plane 
methods, if they can be solved at all without resorting to enumeration. 

The scheme we have discussed so far can be generalized to the more inter- 
esting case where k combinatorial optimization problems are given on k disjoint 
ground sets. The variables associated to the elements of all the ground sets are 
then linked together by some additional constraints. Because of these additional 
constraints not all fc-tuples of feasible solutions (one taken from each problem) 
are feasible for the whole problem. Many interesting problems can be formu- 
lated in this way. For example, the binary cutting stock problem described in 
Section |2] where each of the k problems is a knapsack, is of this type. Another 
example is given by the capacitated vehicle routing problem, there each of the 
k building blocks is a minimum length cycle problem with side constraints. For 
the sake of simplicity, we will not consider this more general setting here and we 
will stick to the simple case, where k = 1, developed so far. 

If the solution yx of m is not integer, we have to resort to one of the two 
main operations of branch-and-cut: either we branch on a fractional variable 
yp or we add a new valid inequality that cuts the point y away. Both these 
operations present some difficulties when the linear programming relaxation is 
solved by a column generation algorithm. 

Suppose we branch on variable yp, so we do a variable setting that is equiv- 
alent to adding the constraint = 0 to the problem (1101 and we solve it again. 
Then we must check if any variables corresponding to feasible solutions in .7^ \ A 
have positive reduced cost. However, while without variable setting all variables 
associated with the set X have non-positive reduced cost, now yp does have 
positive reduce cost and F G X. Therefore, it may be the case that the column 
generation subproblem selects right F as the new element to be put in the set X. 
To avoid this dead-lock, in the column generation subproblem we have to look 
for the second best solution or, when in general i — 1 variables have been already 
set, for the i-th best solution. However, finding the *-th best solution is in gen- 
eral more difficult than simply finding an optimum one. There are polynomially 
solvable problems for which the i-th best version is NP-hard. Thus branching 
may produce inefficiencies in the column generation subproblem. 

To avoid such inefficiencies, some “ad hoc” branching strategies have been 
proposed in the literature for specific column generation subproblems. For ex- 
ample, in the case when m is a knapsack problem, Vance et al. [68| propose 



198 



M. Elf et al. 



a branching strategy that leaves the structure of the column generation sub- 
problem almost unchanged. Nevertheless, even in this nice case, there are some 
unfortunate situations where the subproblem becomes considerably harder and 
needs to be solved by a general IP optimizer. 

It is even more difficult to add cutting planes to the linear program. The 
program (iini, with the addition of the integrality requirement for the variables 
y, is an integer linear program for which several techniques exist to generate a 
valid inequality c^j/x < d that cuts the solution yx away. However, in order to 
exploit this cutting plane to strengthen the current linear programming relax- 
ation within a column generation scheme, the following two conditions must be 
satisfied: 

a) For each F ^ X it must be easy to compute the lifting coefficient cp. 

b) The column generation subproblem must keep the original structure, or, at 

least, must maintain the same level of difficulty after the cut has been added. 

Condition a) is quite difficult to satisfy in general. Moreover, even in case the 
lifting coefficient can be computed, i.e., when we can find a function c: — >■ IR 

that associates a coefficient with every feasible solution of iF, the situation may 
not be tractable yet. After introducing the inequality c^y < d into the linear 
program, denoting by t its corresponding dual variable in the new optimum 
solution, the column generation subproblem becomes 

maximize {w — A)x^ — tcp — v 

subject to F G F. ^ 

The complexity of such a problem depends on the function c and it is conceivably 
hard in most of the cases. 

For these reasons the cut generation phase never takes place when the linear 
programming relaxation is solved with column generation techniques. Such a 
simplified version of branch-and-cut where only new columns are added to the 
constraint matrix, and never new rows, is called branch- and-price and has been 
successfully applied to solve some large problems. It is evident, though, that such 
a technique can produce good results only when the gap between the objective 
function value of the integral optimum solution and the optimum of the linear 
programming relaxation cni) is sufficiently small. When this is not the case, the 
lack of the cutting plane phase may induce poor performance of the algorithm. 

To overcome these difficulties, Felici, Gentile, and Rinaldi have proposed 
the following method that makes it possible to use the branch-and-cut method- 
ology at its full power. 

Instead of choosing one of the ground set and the subset formulations, the 
solution algorithm considers both of them. The latter is used to solve the current 
linear programming relaxation while the former is used to find violated cuts and 
to do the branching. Let us see in more detail how this “double-formulation” 
algorithm works. 
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We solve the current linear program (uni and we find the solution yx- Then 
we translate yx into a feasible solution x for the system (0by 

x= ypyp- 

F£X 

It is easy to see that x optimizes the linear function w'^x over the system ([T}. 
Now we assume that x is not integral and that we are able to solve the separation 
problem for this point with respect to the convex hull of the integral solutions of 
(0- Let c^x < d he the inequality identified by the separation algorithm. Then 
the inequality ^yx < d is generated using the following map that assigns a 
coefficient to each feasible solution in T\ 



Cf 



T F 

C X ■ 



(13) 



The inequality ^yx < d is valid for all the integral solutions of the linear 
program 0 and is violated by yx ■ Thus, it is added to the constraint matrix of 
the linear program (ll()|l . Observe that by (I13II we can now compute the lifting 
coefficient Cp for each F G J-. Moreover, due to the special form of the lifting 
coefficients, the column generation subproblem after adding the new constraint, 
whose associated dual variable is denoted by t, becomes 

maximize wp — u^Ap — tcp — v 

= wx^ - vAAx^ - tc^X^ -V . , . ^ 

= {w — vAA — tc^)x^ — V 
subject to F G T 

which differs from the original subproblem (HB only in the objective function. 
As the structure of this subproblem remains unchanged after adding the new 
inequality, the column generation phase and the cutting plane phase can be in- 
tegrated without interfering with each other. The branching phase can be han- 
dled analogously: if the current solution x has a fractional component, say Xe^ 
we generate two new problems to which we add the two (non-valid) inequalities 
Xe < 0 and Xe > 1, respectively. Branching on inequalities rather than on vari- 
ables is treated in the same way. The counterpart of these inequalities in the 
corresponding subset formulation is produced using (1131) . 

In conclusion, using this technique, a full branch-and-cut scheme can be used 
even when columns are dynamically added to the constraint matrix via a column 
generation process that extracts them from an exponentially large set. To stress 
this aspect, we use the name branch- and- cut- and-yrice for this version of the 
algorithm. 

In |30| Gentile, Felici, and Rinaldi used a branch-and-cut-and-price algorithm 
to produce optimum or nearly optimum solutions to a complex mixed integer 
problem arising in the scheduling of ships for product distribution in the oil 
industry. The model is originated by k distinct combinatorial problems whose 
variables are linked by additional constraints. 
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5 Design and Usage of ABACUS 

In Section we introduced a generic algorithmic framework for branch-and-cut 
(and price) algorithms that is common to almost all software implementations 
of such algorithms. In contrast to this generality almost all implementations of 
branch-and-cut algorithms had been done from scratch for a long time. In order 
to overcome the tremendous waste of time used for repeatedly implementing 
similar computer codes sharing the same algorithmic techniques the software 
system ABACUS — A Branch And CUt System has been developed. From an 
early predecessor that was a C-library developed by Michael Jiinger and Gerhard 
Reinelt in the late eighties, the system underwent several minor and two complete 
revisions, until Stefan Thienel m developed ABACUS as an object oriented 
system in his doctoral thesis. 

ABACUS provides a framework for implementation of branch-and-bound 
algorithms using linear programming relaxations that can be completed with 
the dynamic generation of cutting planes or columns. This system allows the 
software developer to concentrate merely on the problem specific parts, i.e., the 
cutting plane part, the column generation, and the heuristics. Moreover ABA- 
CUS provides a variety of general algorithmic concepts, e.g., enumeration and 
branching strategies, from which the user of the system can choose the best al- 
ternative for his application. If the provided features do not suffice or there are 
better problem specific techniques it is easy to extend the system. Finally, ABA- 
CUS provides many basic data structures and useful tools for implementation of 
extensions. ABACUS is designed both for general mixed integer problems and 
combinatorial optimization problems but is especially useful for combinatorial 
optimization. Other systems (like BC-OPT [T^, MINTO |^, Cplex jl7] and 
XPress |69p emphasize on mixed integer optimization. 

Simple reuse of code and design of abstract data structures are essential for 
a modern software framework. These requirements are met by object oriented 
programming techniques. Therefore ABACUS was implemented as a C-|— I- class 
library, i.e., a collection of C-|— I- classes. ABACUS uses extensively the object 
oriented features of C-|— I- and is not only a C-|— I- adaption of C code. In order 
to understand its design and use, some experience with C-|— I- is a prerequisite. 
Since C-|— I- has become some of the most popular programming languages and 
will probably not loose its popularity in the near future, choosing C-| — h was a 
reasonable decision. 

5.1 Basic Design Ideas 

From the point of view of a user, who wants to implement a linear programming 
based branch-and-bound algorithm, ABACUS provides a small system of base 
classes from which the application specific classes can be derived. All problem 
independent parts are “invisible” for the user such that one can concentrate on 
the problem specific algorithms and data structures. 

The basic ideas are pure virtual functions, virtual functions, and virtual 
dummy functions. A pure virtual function has to be implemented in a class 
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derived by the user of the framework, e.g., the initialization of the branch-and- 
bound tree with a subproblem associated with the application. In virtual func- 
tions default implementations are provided. They are often useful for a big num- 
ber of applications, but can be redefined if required. The branching strategy is a 
good example. Finally, virtual dummy functions are used that are virtual func- 
tions that do nothing in their default implementations, but can be redefined in 
derived classes, e.g., the separation of cutting planes. They are not pure virtual 
functions as their definition is not required for the correctness of the algorithm. 

Moreover, an application based on ABACUS can be refined step by step. 
Only the derivation of a few new classes and the definition of some pure virtual 
functions is required to get a branch-and-bound algorithm running. Then, this 
branch-and-bound algorithm can be enhanced by the dynamic generation of 
constraints and/or variables, heuristics, or the implementation of new branching 
or enumeration strategies. 

Default strategies are available for numerous parts of the branch-and-bound 
algorithm, and they can be controlled via a parameter file. If none of the system 
strategies meets the requirements of the application, the default strategy can 
simply be replaced by the redefinition of a virtual function in a derived class. 



5.2 Structure of the System 

The inheritance graph of any set of classes in C-|— I- must be a directed acyclic 
graph. Very often these inheritance graphs form forests or trees. Also the inher- 
itance graph of ABACUS is designed as a tree with a single exception where 
multiple inheritance is used. 

Basically, the classes of ABACUS can be divided into three different main 
groups shown in Table [21 The application base classes are the most important 
ones for the user. From these classes the user of the framework has to derive the 
classes for her or his applications. The pure kernel classes are usually invisible 
for the user. To this group belong, e.g., classes for supporting the branch-and- 
bound algorithm, for the solution of linear programs, and for the management of 
constraints and variables. Finally, there are the auxiliaries, i.e., classes providing 
basic data structures and tools that can optionally be used for the implementa- 
tion of an application. 



Table 2. The classes of ABACUS. 



Pure Kernel 


Application Base 


Auxiliaries 


Linear Program 
Pool 

branch-and-bound 


Master 

Subproblem 

Constraints 

Variables 


Basic Data Structures 
Tools 
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Fig. 14. Inheritance relations between ABACUS kernel classes and user defined 
derivations. 



5.3 Essential Classes for an Applications 

In this section we describe the ABACUS classes that are usually involved in the 
derivation process for the implementation of a new application. We give their 
description not in the form of a manual by describing each member of the class 
(this is done in [l]), but we try to explain the concepts, design ideas, and usage. 
Fig. [T^ shows the class graph of the application base classes and the derived 
application specific classes (represented by white boxes with bold font). 



The Root of the Class- Tree. It is well known that global variables, constants, 
or functions, can cause a lot of problems within a big software system. This is 
even worse for frameworks such as ABACUS that are used by other program- 
mers and may be linked together with other libraries. Here, name conflicts and 
undesired side effects are almost inevitable. 

All functions and enumerations that might be used by all other classes are 
embedded in the class ABA_ABACUSR00T. This class is used as a base class for 
all classes within the system. Currently, ABA_ABACUSR00T implements only an 
enumeration with the different exit codes of the framework and implements 
some public member functions. The most important one of them is the function 
exit 0 that calls the system function exit () . This construction turns out to be 
very helpful for debugging purposes. 



The Master. In an object oriented implementation of a linear programming 
based branch-and-bound algorithm we require one object that controls the op- 
timization, in particular the enumeration and resource limits, and stores data 
that can be accessed from any other object involved in the optimization of a 
specific instance. This task is performed by the class ABAJfASTER (that is not 
identical with the root node of the enumeration tree). For each application of 
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ABACUS we have to derive a class from ABAJIASTER implementing problem 
specific “global” data and functions. 

Every object that requires access to this ’’global” information, stores a pointer 
to the corresponding object of the class ABA_MASTER. This holds for almost all 
classes of the framework. For example, the class ABA_SUB, implementing a sub- 
problem of the branch-and-bound tree, has as a member a pointer to an object 
of the class ABA_MASTER: 

class ABA_SUB { 

ABA_MASTER *master_; 

}; 

Then, we can access within a member function of the class ABA_SUB, e.g., the 
global upper bound by calling 

master_->upperBound() ; 

where upperBoundO is a member function of the class ABAJIASTER. 

Within a specific application there are always some global data members as 
the output and error streams, zero tolerances, a big number representing “infin- 
ity”, and some functions related with these data. Instead of implementing this 
data directly in the class ABAJfASTER, ABACUS has an extra class ABA_GL0BAL, 
from which the class ABA_MASTER is derived. The reason is that there are several 
classes, especially some basic data structures that might be useful in programs 
that are not branch-and-bound algorithms. To simplify their reuse, these classes 
have a pointer to an object of the class ABA_GL0BAL instead of one to an object 
of the class ABA_MASTER. 

Branch-and-bound Data and Functions. The class ABA_MASTER augments the 
data inherited from the class ABA_GLDBAL with specific data members and func- 
tions for branch-and-bound. It has objects of classes as members that store the 
list of subproblems that still must be processed in the implicit enumeration (class 
ABA_0PENSUB), and that store the variables that might be fixed by reduced cost 
criteria in later iterations (class ABA_FIXCAND). Moreover, the solution history, 
timers for parts of the optimization, and a lot of other statistical information is 
stored within the class ABAJIASTER. 

The class ABAJIASTER also provides default implementations of pools for the 
storage of constraints and variables. We explain the details later in this section. 

A branch-and-bound framework requires a flexible way for defining enumer- 
ation strategies. The corresponding virtual functions are defined in the class 
ABAJIASTER, but for a better understanding we explain this concept below, when 
we discuss the data structure for the open subproblems. 

Limits on the Optimization Process. The control of limits on the optimization 
process, e.g., the amounts of CPU time and wall-clock time, and the size of the 
enumeration tree, are performed by members of the class ABA_MASTER during the 
optimization process. Also the quality guarantee of the solution is monitored by 
the class ABAJIASTER. 
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The Initialization of the Branch- and- Bound Tree. When the optimization is 
started, the root node of the branch-and-bound tree must be initialized with 
an object of the class ABA_SUB. However, the class ABA_SUB is an abstract class, 
from which a class implementing the problem specific features of the subproblem 
optimization is derived. Therefore, the initialization of the root node is performed 
by a pure virtual function that must return a pointer to a class derived from the 
class ABA_SUB. This function must be defined by a problem specific class derived 
from the class ABAJfASTER. 

The Sense of the Optimization. For simplification software systems for mini- 
mization and maximization problems use internally only one sense of the op- 
timization, e.g., minimization. Within a framework this strategy is dangerous, 
because if we access internal results, e.g., the reduced costs, from an application, 
we might misinterpret them. Therefore, ABACUS also works internally with 
the true sense of optimization. The value of the best known feasible solution is 
denoted primal bound, the value of a linear programming relaxation is denoted 
dual bound if all variables price out correctly. The functions lowerBoundO and 
upperBoundO interpret the primal or dual bound, respectively, depending on 
the sense of the optimization. An equivalent method is also used for the local 
bounds of the subproblems. 



The Subproblem. The class ABA_SUB represents a subproblem of the implicit 
enumeration, i.e., a node of the branch-and-bound tree. It is an abstract class, 
from which a problem specific subproblem can be derived. In this derivation pro- 
cess problem specific functions can be added, e.g., for the generation of variables 
or constraints. 

The Root Node of the Branch- and- Bound Tree. For the root node of the opti- 
mization, the constraint and variable sets can be initialized explicitly. By default, 
the first linear program is solved with the barrier method followed by a crossover 
to a basic solution, but a flexible mechanism for the selection of the LP-method 
is provided. 

The Other Nodes of the Branch- and- Bound Tree. As long as only globally valid 
constraints and variables are used, it would be correct to initialize the constraint 
and variable system of a subproblem with the system of the previously processed 
subproblem. However, ABACUS is designed also for locally valid constraints 
and variables. Therefore, each subproblem inherits the final constraint and vari- 
able system of the father node in the enumeration tree. This system might be 
modified by the applied branching rule. Moreover, this approach avoids tedious 
recomputations and makes sure that heuristically generated constraints do not 
get lost. 

If conventional branching strategies, like setting a binary variable, changing 
the bounds of an integer variable, or even adding a branching constraint are 
applied, then the basis of the last solved linear program of the father is still dual 
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feasible. As the basis status of the variables and slack variables is stored phase 1 
of the simplex method can be avoided if we use the dual simplex method. If due 
to another branching method, e.g., for branch-and-price algorithms, the dual 
feasibility of the basis is lost, another LP-method can be used. 

Branch- and- Bound. A linear programming based branch-and-bound algorithm 
in its simplest form is obtained if linear programming relaxations in each sub- 
problem are solved that are neither enhanced by the generation of cutting planes 
nor by the dynamic generation of variables. Such an algorithm requires only two 
problem specific functions: one to check if a given LP-solution is a feasible solu- 
tion of the optimization problem, and one for the generation of the sons. 

The first function is problem specific, because, if constraints of the integer 
programming formulation are violated, the condition that all discrete variables 
have integer values is not sufficient. For safety, this function is declared pure 
virtual. The second required problem specific function is usually only a one- 
liner, that returns the problem specific subproblem generated by a branching 
rule. Hence, the implementation of a pure branch-and-bound algorithm does not 
require very much effort. 

The Optimization of the Subproblem. The core of the class ABA_SUB is its opti- 
mization by a cutting plane algorithm. As dynamically generated variables are 
dual cuts, we also use the notion cutting plane algorithm for a column generation 
algorithm. By default, the cutting plane algorithm only solves the LP-relaxation 
and tries to fix and set by reduced costs. Within the cutting plane algorithm 
four virtual dummy functions for the separation of constraints, for the pricing 
of variables, for the application of LP-based heuristics, and for fixing variables 
by logical implications are called. These virtual functions can be redefined in a 
problem specific class derived from the class ABA_SUB. In addition to the manda- 
tory pricing phase before the fathoming of a subproblem, the inactive variables 
are priced out every k iterations in a branch-and-cut and price algorithm. Other 
strategies for the separation/pricing decision can be implemented by the redefi- 
nition of a virtual function. 

Adding Constraints. Cutting planes may not only be generated in the function 
separate 0 but also in other functions of the cutting plane phase. E.g., for the 
maximum cut problem it is useful if the generation of cutting planes is also pos- 
sible in the LP-based heuristic. If not all constraints of the integer programming 
formulation are active, then it might be necessary to solve a separation problem 
also for the feasibility test. Therefore, the generation of cutting planes is allowed 
in every subroutine of the cutting plane algorithm. 

Adding Variables. Like for constraints, also the generation of variables is allowed 
everywhere in the subproblem optimization. 

Buffering New Constraints and Variables. New constraints and variables are not 
immediately added to the subproblem, but stored in buffers and added at the 
beginning of the next iteration. We present the details of this concept below. 
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Removing Constraints and Variables. In order to avoid corrupting the linear 
program and the sets of active constraints and variables, and to allow the removal 
of variables and constraints in any subroutine of the cutting plane phase, also 
these variables and constraints are buffered. The removal is executed before 
constraints and variables are added at the beginning of the next iteration of the 
cutting plane algorithm. 

Moreover, a default function for the removal of constraints according to the 
value or the basis status of the slack variables is provided. Variables can be 
removed according to the value of the reduced costs. These operations can be 
controlled by parameters and the corresponding virtual functions can be rede- 
fined if other criteria should be applied. ABACUS tries to remove constraints 
also before a branching step is performed. 

The Active Constraints and Variables. In order to allow a flexible combination 
of constraint and variable generation, every subproblem has its own set of active 
constraints and variables, that are represented by the generic class ABA_ACTIVE. 
By default, the variables and the constraints of the last solved linear program of 
the father of the subproblem are inherited. Therefore, the local constraint and 
variable sets speed up the optimization. 

Together with the active constraints and variables ABACUS also stores in 
every subproblem the LP-statuses of the variables and slack variables, the upper 
and lower bounds of the variables, and if a variable is fixed or set. 

The Linear Program. Every subproblem has its own linear program that is only 
set up for an active subproblem. Of course, the initialization of the linear program 
at the beginning and its deletion at the end of the subproblem optimization costs 
some running time in comparison to the considerable maintenance of a global 
linear program that could be stored in the master. Our current computational 
experience shows that this overhead is not too big and pays because bookkeeping 
is greatly facilitated and parallelization is easy possible. 

The LP-Method. Currently, three different methods are available in state of the 
art LP-solvers: the primal simplex method, the dual simplex method, and the 
barrier method in combination with cross over techniques for the determination 
of an optimum basic solution. The choice of the method can be essential for the 
performance. If a primal feasible basis is available, the primal simplex method 
is often the right choice. If a dual feasible basis is available, the dual simplex 
method is usually preferred. And finally, if no basis is known, or the linear 
programs are very large, often the barrier methods yield the best running times. 

Therefore, by default a linear program is solved by the barrier method, if it is 
the first linear program solved in the root node or constraints and variables have 
been added at the same time, by the primal simplex method, if constraints have 
been removed or variables have been added, and by the dual simplex method, 
if constraints have been added, or variables have been removed, or it is the first 
linear program of a subproblem that is not the root node. 
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However, it should be possible to add problem specific decision criteria. 
Again, a virtual function gives us all fiexibility. We keep control when this func- 
tion is invoked, namely at the point when all decisions concerning addition and 
removal of constraints and variables have been taken. The function has as ar- 
guments the correct numbers of added and removed constraints and variables. 
If we want to choose the LP-method problem specifically, then we can redefine 
this function in a class derived from the class ABA_SUB. 

Generation of Non-liftable Constraints. If constraint and variable generation are 
combined, then the active constraints must be lifted if a variable is added, i.e., 
the column of the new variable must be computed. This lifting can not always 
be done in a straightforward way, it can even require the solution of another 
optimization problem. Moreover, lifting is not only required when a variable is 
added, but this problem has to be attacked already during the solution of the 
pricing problem. 

In order to allow the usage of constraints that cannot be lifted or for which 
the lifting cannot be performed efficiently, ABACUS provides a management of 
non-liftable constraints. Each constraint has a flag if it is liftable. If the pricing 
routine is called and non-liftable constraints are active, then all non-liftable 
constraints are removed, the linear programming relaxation is solved again, and 
the cutting plane algorithm is continued before the pricing phase is reentered. 
In order to avoid an infinite repetition of this process we forbid the further 
generation of non-liftable constraints during the rest of the optimization of this 
subproblem. 

Reoptimization. If the root of the remaining branch-and-bound tree changes, 
but the new root has been processed earlier, then it can be advantageous to op- 
timize the corresponding subproblem again, in order to get improved conditions 
for fixing variables by reduced costs. Therefore, ABACUS provides the reopti- 
mization of a subproblem. The difference to the ordinary optimization is that 
no branching is finally performed even if the subproblem is not fathomed. If it 
turns out during the reoptimization that the subproblem is fathomed, then all 
subproblems contained in the subtree rooted at this subproblem are fathomed. 

Branching. Virtual functions for the flexible definition of branching strategies 
are implemented in the class ABA_SUB. We explain below. 

Memory Allocation. Since constraints and variables are added and removed dy- 
namically, ABACUS provides a dynamic memory management system, that 
requires no user interaction. If there is not enough memory to add a constraint 
or variable, memory reallocations are performed automatically. As the reallo- 
cation of the local data, in particular of the linear program, can require a lot 
of CPU time, if it is performed regularly, some extra space is allocated for the 
addition of variables and constraints, and for the nonzero entries of the matrix 
of the LP-solver. 
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Activation and Deactivation. In order to save memory, ABACUS sets up those 
data structures that are only required if the subproblem is active, e.g., the linear 
program, at the beginning of the subproblem optimization, and frees the memory 
again when the subproblem becomes inactive. We observed that the additional 
CPU time required for these operations is negligible, but the memory savings 
are significant. 

Constraints and Variables. Motivated by linear programming duality, com- 
mon features of constraints and variables are embedded in a joint base class 
ABAMONVAR. 

Constraint /Variable versus Row/Column. Usually, the notions constraint and 
row, and the notions variable and column, respectively, are used equivalently. 
We also followed this terminology so far since, e.g., the notion column gener- 
ation algorithm is more common than variable generation algorithm. Within 
ABACUS, constraints and rows are different items. Constraints are stored in 
the pool, and a subproblem has a set of active constraints. Only if a constraint is 
added to the linear program, then the corresponding row is computed. More pre- 
cisely, a row is a representation of a constraint associated with a certain variable 
set. 

The subtour elimination constraints for the TSP provide a good example for 
the usefulness of this differentiation: Storing such an inequality 

2 

^es j&v\S 

would require storing all edges (variable) in S{S) with space requirement C(|S'p). 
But storing the elements in S requires only 0{S) space. Given a variable Xg 
associated with edge e the coefficient of the subtour elimination constraint is 
1 if e G d{S) and 0 otherwise. Thus not only the generation of the constraint 
for a given set of active variables, but also the determination of of the lifting 
coefficients for other variables during pricing or variable generation is easy in 
this case. 

Efficient memory management and dynamic variable generation are the rea- 
son why ABACUS distinguishes between constraints and rows. Each constraint 
must have a member function that returns the coefficient for a variable such that 
we can determine the row corresponding to a set of variables. 

In these considerations “constraint” can be also replaced by “variable” and 
“row” by “column” . A column is the representation of a variable corresponding 
to a given constraint set. Again, we use the TSP as an example. A variable for 
the TSP corresponds to an edge in a graph. Hence, it can be represented by its 
end nodes. The column associated with this variable consists of the coefficients 
of the edge for all active constraints. 

ABACUS implements these concepts in the classes ABA_CDNSTRAINT and 
ABA_VARIABLE that are used for the representation of active constraints and vari- 
ables and for the storage of constraints and variables in the pools, and ABA_R0W 
and ABA_C0LUMN that are used in the interface to the LP-solver. 
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Common Features of Constraints and Variables. Constraints and variables have 
several common features that are presented in a common base class. A con- 
straint/variable is active if it belongs to the constraint/ variable set of an active 
subproblem. An active constraint/ variable must not be removed from its pool. 
Besides being active there can be other reasons why a constraint /variable should 
not be deleted from its pool, e.g., if the constraint/variable has just been gen- 
erated, then it is put into a buffer, but is not yet activated. In such a case we 
want to set a lock on the constraint that it cannot be removed (we explain the 
details below). 

ABACUS also distinguishes between dynamic variables/constraints and 
static ones. As soon as a static variable/constraint becomes active it cannot be 
deactivated. An example for static variables are the variables in a general mixed 
integer optimization problem, examples for static constraints are the constraints 
of the problem formulation of a general mixed integer optimization problem 
or the degree constraints of the TSP. Dynamic constraints are usually cutting 
planes. Column generation algorithms feature dynamic variables. 

A crucial point in the implementation of a special variable or constraint class 
is the tradeoff between performance and memory usage. It has been observed 
that a memory efficient storage format can be one of the keys to the solution of 
larger instances. Such formats are in general not very useful for the computa- 
tion of the coefficient of a single variable/constraint. Moreover, if the coefficients 
of a constraint for several variables or the coefficients of a variable for several 
constraints have to be computed, e.g., when the row/column format of the con- 
straint/variable is generated in order to add it to the LP-solver, then these 
operations can become a bottleneck. However, given a different format, using 
more memory, it might be possible to perform these operations more efficiently. 

Therefore, ABACUS provides a compressed format and an expanded format 
of a constraint/variable. Before a large number of time consuming coefficient 
computations is performed, the system tries to generate the expanded format, 
and afterwards the constraint/ variable is compressed. The implementation of 
the expansion and compression is optional. 

We use again the subtour elimination constraint of the TSP as an example for 
the compressed and expanded format. For an inequality ^ 

we store the nodes of the set S in the compressed format. The computation of 
the coefficient of an edge e = {u, v) requires 0(|S'|) time and space. As expanded 
format we use an array inSubtour of type bool of length n (n is the number of 
nodes of the graph) and inSubtour [v] is true if and only if v G S. Now, we can 
determine the coefficient of an edge (variable) in constant time. 



Constraints. ABACUS provides all three different types of constraints: equa- 
tions, <-inequalities and >-inequalities. The only pure virtual function is the 
computation of a coefficient of a variable. It is used to generate the row format 
of a constraint, to compute the slack of an LP-solution, and to check if an LP- 
solution violates a constraint. All these functions are declared virtual such that 
they can be redefined for performance enhancements. 




210 



M. Elf et al. 



If variables are generated dynamically, ABACUS distinguishes between lift- 
able and non-liftable constraints. Non-liftable constraints must be removed be- 
fore the pricing problem can be solved. 

Variables. ABACUS supports continuous, integer, and binary variables in the 
class ABA_VARIABLE. Each variable has a lower and an upper bound that can be 
set to plus/minus infinity if the variable is unbounded. We also memorize if a 
variable is fixed. 

The corresponding functions have their dual analogs in the class ABA_C0N- 
STRAINT. The only pure virtual function is now the function that returns a 
coefficient in a constraint. With this function the generation of the column format 
and the computation of the reduced cost can be performed. We say a variable is 
violated if it does not price out correctly. 

Constraint and Variable Pools. Every constraint and variable either induced 
by the problem formulation or generated in a separation or pricing step is stored 
in a pool. A pool is a collection of constraints and variables. We will see later 
that it is profitable to keep separate pools for variables and constraints. Then, 
we will also discuss when it is useful to have different pools for different types 
of constraints or variables. But for simplicity we assume now that there is only 
one variable pool and one constraint pool. 

There are two reasons for the usage of pools: saving memory and an additional 
separation/pricing method. 

A constraint or variable usually belongs to the set of active constraints or 
variables of several subproblems that still have to be processed. Hence, it is 
useful to store in the sets of active constraints or variables only pointers to 
each constraint or variable that is stored at some central place, i.e., in a pool 
that is a member of the corresponding master of the optimization. Our practical 
experiments show that this memory sensitive storage format is of very high 
importance, since already this pool format uses a large amount of memory. 

Pool- Separation/ Pricing. From the point of view of a single subproblem a pool 
may not only contain active but also inactive constraints or variables. The inac- 
tive items can be checked in the separation or pricing phase, respectively. We call 
these techniques pool-separation and pool-pricing. Again, motivated by duality 
theory we use the notion “separation” also for the generation of variables, i.e., 
for pricing. Pool-separation is advantageous in two cases. First, pool-separation 
might be faster than the direct generation of violated constraints or variables. 
In this case, we usually check the pool for violated constraints or variables, and 
only if no item is generated, we use the more time consuming direct methods. 
Second, pool-separation turns out to be advantageous, if a class of constraints or 
variables can be separated/priced out only heuristically. In this case, it can hap- 
pen that the heuristic cannot generate the constraint or variable although it is 
violated. However, earlier in the optimization process this constraint or variable 
might have been generated. In this case the constraint or variable can be regener- 
ated from the pool. Computational experiments in m show that this additional 
separation or pricing method can decrease the running time significantly. 
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Pool-separation is also one of the reasons why it can be helpful to provide 
several constraint or variable pools. E.g., some constraints might be more impor- 
tant during the pool-separation than other constraints. In this case, we might 
check this “important” pool first and only if we fail in generating any item we 
might proceed with other pools or continue immediately with direct separation 
techniques. 

Other classes of constraints or variables might be less important in the sense 
that they cannot or can only very seldomly be regenerated from the pool (e.g., 
locally valid constraints or variables). Such items could be kept in a pool that 
immediately removes all items that do not belong to the active constraint or 
variable set of any subproblem that still has to be processed. A similar strategy 
might be required for constraints or variables requiring a big amount of memory. 

Finally, there are constraints for which it is advantageous to stay active in 
any case (e.g., the constraints of the problem formulation in a general mixed 
integer optimization problem, or the degree constraints for the TSP). Also for 
these constraints separate pools are useful. 

Garbage Collection. In any case, as soon as a lot of constraints or variables are 
generated dynamically we can observe that the pools become very large. In the 
worst case this might cause an abnormal termination of the program if it runs 
out of memory. But already earlier the optimization process might be slowed 
down since pool-separation takes too long. Of course, the second point can be 
avoided by limited strategies in pool-separation, which we will discuss later. But 
to avoid the first problem we require suitable cleaning up and garbage collection 
strategies. 

The simplest strategy is to remove all items belonging not to any active 
variable or constraint set of any active or open subproblem in a garbage collection 
process. The disadvantage of this strategy might be that good items are removed 
that are accidentally momentarily inactive. A more sophisticated strategy might 
be counting the number of linear programs or subproblems where this item has 
been active and removing initially only items with a small counter. 

Unfortunately, if the enumeration tree grows very large or if the number of 
constraints and variables that are active at a single subproblem is high, then 
even the above brute force technique for the reduction of a pool turns out to be 
insufficient. 

Hence, ABACUS divides constraints and variables into two groups. On the 
one hand the items that must not be removed from the pool, e.g., the constraints 
and variables of the problem formulation of a general mixed integer optimization 
problem, and on the other hand those items that can either be regenerated in 
the pricing or separation phase or are not important for the correctness of the 
algorithm, e.g., cutting planes. If we use the data structures we will describe 
now, then we can remove safely an item of the second group. 

Pool Slots. So far, we have assumed that the subproblems store pointers to 
variables or constraints, respectively, which are stored in pools. If we remove 
the variable or constraint, i.e., delete the memory we have allocated for this 
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object, then errors will occur if we access the removed item from a subproblem. 
ABACUS contains a data structure that can avoid this problem very efficiently. 

A pool is not a collection of constraints or variables, but a collection of 
pool slots (class ABA_P00LSLDT). Each slot stores a pointer to a constraint or 
variable or a 0-pointer if it is void. The sets of active constraints or variables 
in subproblems consist of pointers to the corresponding slots instead of storing 
pointers to the constraints or variables directly. If a constraint or variable has 
been removed, a 0-pointer will be found in the slot and the subproblem recognizes 
that the constraint or variable must be eliminated since it cannot be regenerated. 
The disadvantage of this method is that finally our program may run out of 
memory since there are many useless slots. 

In order to avoid this problem, a version number is added as data member to 
each pool slot. Initially the version number is 0 and becomes 1 if a constraint or 
variable is inserted in the slot. After an item in a slot is deleted, a new item can 
be inserted into the slot. Each time a new item is stored in the slot the version 
number is incremented. The sets of active constraints and variables do not only 
store pointers to the corresponding slots but also the version number of the slot 
when the pointer is initialized. If a member of the active constraints or variables 
is accessed we compare its original and current version number. If these numbers 
are not equal we know that this is not the constraint or variable we were originally 
pointing to and remove it from the active set. We call the data structure storing 
the pointer to the pool slot and the original version number a reference to a pool 
slot (class ABA_P00LSL0TREF). Hence, the sets of active constraints and variables 
are arrays of references to pool slots. This pool concept is illustrated in Fig. [13 

Standard Pool. The class ABA_PD0L is an abstract class that does not specify 
the storage format of the collection of pool slots. The simplest implementation 
is an array of pool slots. The set of free pool slots can be implemented by a 
linked list. This concept is realized in the class ABA_STANDARDPODL. Moreover, a 
ABA_STANDARDPOOL can be static or dynamic. A dynamic ABA_STANDARDPDOL is 
automatically enlarged, when it is full, an item is inserted, and the cleaning up 
procedure fails. A static ABA_STANDARDPOOL has a fixed size and no automatic 
reallocation is performed. More sophisticated implementations might keep an 
order of the pool slots such that “important” items are detected earlier in a 
pool-separation and a limited pool-separation might be sufficient. A criterion 
for this order could be the number of subproblems where this constraint or 
variable is active or has been active. 

Default Pools. The number of the pools is very problem specific and depends 
mainly on the separation and pricing methods. Since in many applications a pool 
for variables, a pool for the constraints of the problem formulation, and a pool 
for cutting planes are sufficient, ABACUS implements this default concept. 
If not specified differently these default pools are used in the initialization of 
the pools, in the addition of variables and constraints, and in the pool-pricing 
and pool-separation. ABACUS uses a static ABA_STANDARDPDOL for the default 
constraint and cutting planes pools. The default variable pool is a dynamic 
ABA_STANDARDPOOL, because the correctness of the algorithm requires that a 
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Fig. 15. Schematic illustration of the pool/poolslot concept. 



variable that does not price out correctly can be added in any case, whereas the 
loss of a cutting plane that cannot be added due to a full pool has no effect 
on the correctness of the algorithm as long as it does not belong to the integer 
programming formulation. 
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If the default pool concept is replaced by an application specific pool concept, 
the user of the framework must make sure that there is at least one variable pool 
and one constraint pool and these pools are embedded in a class derived from 
the class ABAJIASTER. 

With this concept ABACUS provides a high flexibility: An easy to use de- 
fault implementation that can be changed by the redefinition of virtual functions 
and the application of non-default function arguments. All classes involved in 
this pool concept are designed as generic classes such that they can be used both 
for variables and constraints. 



Linear Programs. Since ABACUS is a framework for the implementation of 
linear programming based branch-and-bound algorithms, it is obvious that the 
solution of linear programs plays a central role, and we require a class concept 
for the representation of linear programs. Moreover, linear programs might not 
only be used for the solution of LP-relaxations in the subproblems, but they can 
also be used for other purposes, e.g., within heuristics for the determination of 
good feasible solutions in mixed integer programming. 

Therefore, ABACUS provides two basic interfaces for a linear program. The 
first one is in a very general form for linear programs defined by a constraint 
matrix stored in some sparse format . The second one is designed for the solution 
of the LP-relaxations in a subproblem. The main differences to the first interface 
are that the constraint matrix is stored in the abstract variable/constraint format 
instead of the column/row format and that fixed and set variables are eliminated. 

Another important design criterion is that the solution of the linear programs 
should be independent from the used LP-solver, and plugging in a new LP-solver 
should be simple. 



The Basic Interface. The result of these requirements is the class hierarchy 
of Fig. UniThe class ABA_LP is an abstract base class providing the public func- 
tions that are usually expected: initialization, optimization, addition of rows and 
columns, deletion of rows and columns, access to the problem data, the solution, 
the slack variables, the reduced costs, and the dual variables. These functions do 
some minor bookkeeping and call a pure virtual function having the same name 
but starting with an underscore (e.g, optimizeO calls .optimize () ). These 
functions starting with an underscore are exactly the functions that have to be 
implemented by a LP-solver. 



The LP-Solvers Cplex, XPress, and SoPlex. The classes ABA.LPSUBCPLEX, 
ABA_LPSUBXPRESS, and ABA.LPSUBSOPLEX implement these solver specific func- 
tions for the LP-solvers Cplex, XPress and SoPlex. If a linear program should 
be solved with Cplex, an object of the class ABA_LPSUBCPLEX is instantiated. 
Only public members that are inherited from the class ABA_LP are used, except 
the constructors. Using another LP-solver means only replacing the name Cplex 
by its name in the instantiation after a similar class for this solver as the class 
ABA_LPSUBCPLEX has been implemented. 
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Fig. 16. Inheritance structure of Linear Programming classes. 



Linear Programming Relaxations. The most important linear programs within 
this system are the LP-relaxations that arise during the optimization of the sub- 
problems. However, the active constraints and variables of a subproblem are not 
stored in the format required by the class ABA_LP. Therefore, we have to imple- 
ment a transformation from the variable/constraint format to the column/row 
format. This is done in the virtual functions genRowO and genColumnO of 
ABA_CDNSTRAINT and ABA_VARIABLE. 

The transformation is not invoked by the class ABA_LP but by an interface 
class ABA_LPSUB. This class works like a preprocessor for the linear programs 
solved in the subproblem. Fixed and set variables can be eliminated from the 
linear program submitted to the solver. It depends on the used solution method 
if all fixed and set variables should be eliminated. If the simplex method is used 
and a basis is known, then only non-basic fixed and set variables should be elim- 
inated. If the barrier method is used, we can eliminate all fixed and set variables. 
The encapsulation of the interface between the subproblem and the class ABA_LP 
supports a more flexible adaption of the elimination to other LP-solvers in the 
future and also enables us to use other LP-preprocessing techniques, e.g., con- 
straint elimination, or changing the bounds of variables under certain conditions, 
without modifying the variables and constraints in the subproblem. Preprocess- 
ing techniques other than elimination of fixed and set variables are currently not 
implemented. 



Solving Linear Programming Relaxations with Cplex, XPress, and SoPlex. The 
subproblem optimization in the class ABA_SUB uses only the public functions of 
the class ABA_LPSUB, which is again an abstract class independent of the used LP- 
solver. A linear program solving the relaxations within a subproblem with, e.g., 
the LP-solver Cplex, is defined by the class ABA_LPSUBCPLEX, which is derived 
from the classes ABA_LPSUB and ABA_CPLEXIF. The class ABAXPSUBCPLEX only 
implements a constructor that passes the arguments to the base classes. Using a 
different LP-solver in this context requires the definition of a class equivalent to 
the class ABA_LPSUBCPLEX and a redefinition of the virtual function ABA_LPSUB 
*generateLp(), which is a one-line function allocating an object of the class 
ABA_LPSUBCPLEX and returning a pointer to this object. 
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Therefore, it is easy to use different LP-solvers for different ABACUS appli- 
cations and it is also possible to use different LP-solvers in a single ABACUS 
application. For instance, if there is a very fast method for the solution of the 
linear programs in the root node of the enumeration tree, but all other lin- 
ear programs should be solved by Cplex, then only a simple modification of 
ABA_SUB: :generateLp() is required. 



Auxiliary Classes for branch-and-bound. In this section we are going to 
discuss the design of some important classes that support the linear programming 
based branch-and-bound algorithm. These are classes for the management of the 
open subproblems, for buffering newly generated constraints and variables, and 
for the implementation of branching rules. 

The Set of Open Subproblems. During a branch-and-bound algorithm subprob- 
lems are dynamically generated in branching steps and later optimized. There- 
fore, we require a data structure that stores pointers to all unprocessed subprob- 
lems and supports the insertion and the extraction of a subproblem. 

One of the important issues in a branch-and-bound algorithm is the enu- 
meration strategy, i.e., which subproblem is extracted from the set of open sub- 
problems for further processing. It would be possible to implement the different 
classical enumeration strategies, like depth-first search, breadth-first search, or 
best-first search within this class. But then an application-specific enumeration 
strategy could not be added in a simple way by a user of ABACUS. Of course, 
with the help of inheritance and virtual functions a technique similar to the one 
for the usage of different LP-solvers for the subproblem optimization could be 
applied. However, there is a much simpler solution for this problem. 

In the class ABA_MASTER a virtual member function is defined that compares 
two subproblems according to the selected enumeration strategy and returns 
— 1 if the first subproblem has higher priority, 1 if the second one has higher 
priority, and 0 if both subproblems have equal priority. Application specific enu- 
meration strategies can be integrated by a redefinition of this virtual function. 
This comparison function of the associated master is called in order to compare 
two subproblems within the extraction operation of the class ABA.OPENSUB. 

The class ABA_OPENSUB implements the set of open subproblems as a dou- 
bly linked linear list. Each time when another subproblem is required for further 
processing the complete list is scanned and the best subproblem according to the 
applied enumeration strategy is extracted. This implementation has the addi- 
tional advantage, that it is very easy to change the enumeration strategy during 
the optimization process, e.g., to perform a diving strategy that uses best-first 
search but performs a limited depth-first search every k iterations. The draw- 
back of this implementation is the linear running time of the extraction of a 
subproblem. If the set of open subproblems would be implemented as a heap, 
then the insertion and the extraction of a subproblem would require logarithmic 
time, whereas in the current implementation the insertion requires constant, but 
the extraction requires linear time. But if the enumeration strategy is changed, 
the heap must be reinitialized from scratch, which requires linear time. 
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However, it is typical for linear programming based branch-and-bound algo- 
rithm that a lot of work is performed in the subproblem optimization, but the 
total number of subproblems is comparatively small. The performance analysis 
of our current applications shows that the running time spent in the manage- 
ment of the set of open subproblems is negligible. Due to the encapsulation of 
the management of the set of open subproblems in the private part of the class 
ABA_QPENSUB, it will be no problem to change the implementation, as soon as it 
is required. 

ABACUS provides four rather common enumeration strategies per default: 
best-first search, breadth-first search, depth-first search, and a simple diving 
strategy performing depth-first search until the first feasible solution is found 
and continuing afterwards with best-first search. 

Buffering Generated Variables and Constraints. Usually, new constraints are 
generated in the separation phase. However, it is possible that in some applica- 
tions violated constraints are also generated in other subroutines of the cutting 
plane algorithm. In particular, if not all constraints of the integer programming 
formulation are active in the subproblem, a separation routine might have to be 
called to check the feasibility of the LP-solution. Another example is the maxi- 
mum cut problem, for which it is rather convenient if new constraints can also be 
generated while we try to find a better feasible solution after the linear program 
has been solved. Therefore, it is necessary that constraints can be added by a 
simple function call from any part of the cutting plane algorithm. 

This requirement also holds for variables. For instance, when we perform a 
special rounding algorithm on a fractional solution during the optimization of 
the TSP (see |13]), we may detect useful variables that are currently inactive. It 
should be possible to add such important variables before they may be activated 
in a later pricing step. 

It can happen that too many variables or constraints are generated such that 
it is not appropriate to add all of them, but only the “best” ones. Measurements 
for “best” are difficult. For constraints this can be the slack or the distance 
between the fractional solution and the associated hyperplane, for variables this 
can be the reduced costs. 

Therefore, ABACUS implements a buffer for generated constraints and vari- 
ables in the generic class ABA_CUTBUFFER, that can be used both for variables 
and constraints. There is one object of this class for buffering variables, the other 
one for buffering constraints. Constraints and variables that are added during 
the subproblem optimization are not added directly to the linear program and 
the active sets of constraints and variables, but are added to these buffers. The 
size of the buffers can be controlled by parameters. At the beginning of the next 
iteration items out of the buffers are added to the active constraint and variable 
sets and the buffers are emptied. An item added to a buffer can receive an op- 
tional rank given by a floating point number. If all items in a buffer have a rank, 
then the items with maximal rank are added. As the rank is only specified by a 
floating point number, different measurements for the quality of the constraints 
or variables can be applied. The number of added constraints and variables can 
be controlled again by parameters. 
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If an item is discarded during the selection of the constraints and variables 
from the buffers, then usually it is also removed from the pool and deleted. 
However, it may happen that these items should be kept in the pool in order to 
regenerate them in later iterations. Therefore, it is possible to set an additional 
flag while adding a constraint or variable to the buffer that prevents it from 
being removed from the pool if it is not added. Constraints or variables that are 
regenerated from a pool receive this flag automatically. 

Another benefit of this buffering technique is that adding a constraint or 
variable does not change immediately the current linear program and the active 
sets. The update of these data structures is performed at the beginning of the 
cutting plane or column generation algorithm before the linear program is solved. 
Hence, this buffering method together with the buffering of removed constraints 
and variables relieves us also from some nasty bookkeeping. 

Branching. It should be possible that in a framework for linear programming 
based branch-and-bound algorithms many different branching strategies can be 
embedded. Standard branching strategies are branching on a binary variable by 
setting it to 0 or 1, changing the bounds of an integer variable, or splitting the 
solution space by a hyperplane such that in one subproblem x < (D and in the 
other subproblem x > (3 must hold. A straightforward generalization is that 
instead of one variable or one hyperplane we use k variables or k hyperplanes, 
which results in a 2*-nary instead of a binary enumeration tree. 

Another branching strategy is branching on a set of equations afx = /3i,... , 
af X = f3i. Here, I new subproblems are generated by adding one equation to the 
constraint system of the father in each case. Of course, as for any branching 
strategy, the complete set of feasible solutions of the father must be covered by 
the sets of feasible solutions of the generated subproblems. 

It is obvious that we require on the one hand a rather general concept for 
branching that does not only cover all mentioned strategies, but should also be 
extendible to “unknown” methods. 

On the other hand it should be simple for a user of the framework to adapt 
an existing branching strategy like branching on a single variable by adding a 
new branching variable selection strategy. 

Again, an abstract class is the basis for a general branching scheme, and 
overloading a virtual function provides a simple method to change the branching 
strategy. ABACUS uses the concept of branching rules. A branching rule defines 
the modifications of a subproblem for the generation of a son. In a branching step 
as many rules as new subproblems are instantiated. The constructor of a new 
subproblem receives a branching rule. When the optimization of a subproblem 
starts, the subproblem makes a copy of the member data defining its father, i.e., 
the active constraints and variables, and makes the modifications according to 
its branching rule. 

The abstract base class for different branching rules is the class ABA_BRANCH- 
RULE, which declares a pure virtual function modifying the subproblem ac- 
cording to the branching rule. We have to declare this function in the class 
ABA_BRANCHRULE instead of the class ABA_SUB because otherwise adding a new 
branch-rule would require a modification of the class ABA_SUB. 
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ABACUS derives from the class ABA_BRANCHRULE classes for branching by 
setting a binary variable (class ABA_SETBRANCHRULE), for branching by changing 
the upper and lower bound of an integer variable (class ABA_BOUNDBRANCHRULE), 
for branching by setting an integer variable to a value (class ABA_VALBRANCHRULE, 
and branching by adding a new constraint (class ABA_CONBRANCHRULE). 

This concept of branching rules should allow almost every branching scheme. 
Especially, it is independent of the number of generated sons of a subproblem. 
Further branching rules can be implemented by deriving new classes from the 
class ABA_BRANCHRULE and defining the pure virtual function for the correspond- 
ing modification of the subproblem. 

In order to simplify changing the branching strategy, ABACUS implements 
the generation of branching rules in a hierarchy of virtual functions of the class 
ABA_SUB. By default, the branching rules are generated by branching on a single 
variable. If a different branching strategy is implemented a virtual function must 
be redefined in a class derived from the class ABA_SUB. 



6 Exercise: Implementation of a Simple TSP Solver 

The exercise that we gave at the end of our teaching unit at the school consisted 
of the implementation of an ABACUS program that solves the TSP exactly, 
given a routine for reading the problem data in TSPLIB format m and the 
minimum capacity cut solver of Jiinger et al. [42] . Much to our satisfaction 
the participants took up the challenge with enthusiasm, and, finally, we had 
12 correct TSP solvers. A sample solution by Stefan Thienel is available as a 
technical report that can be found on the web at: 

http: //www. informatik.uni-koeln.de/ls_juenger/ 

pro j ects/ abacus/abacus_tutorial . ps . gz 
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Abstract. Branch, cut, and price (BCP) is an LP-based branch and 
bound technique for solving large-scale discrete optimization problems 
(DOPs). In BCP, both cuts and variables can be generated dynamically 
throughout the search tree. The ability to handle constantly changing 
sets of cuts and variables allows these algorithms to undertake the so- 
lution of very large-scale DOPs; however, it also leads to interesting 
implementational challenges. These lecture notes, based on our experi- 
ence over the last six years with implementing a generic framework for 
BCP called SYMPHONY (Single- or Multi-Process Optimization over 
Networks), address these challenges. They are an attempt to summarize 
some of what we and others have learned about implementing BCP, both 
sequential and parallel, and to provide a useful reference for those who 
wish to use BCP techniques in their own research. 

SYMPHONY, the software from which we have drawn most of our expe- 
rience, is a powerful, state-of-the-art library that implements the generic 
framework of a BCP algorithm. The library’s modular design makes it 
easy to use in a variety of problem settings and on a variety of hard- 
ware platforms. All library subroutines are generic — their implementa- 
tion does not depend on the problem-setting. To develop a full-scale 
BCP algorithm, the user has only to specify a few problem-specific 
methods such as cut generation. The vast majority of the computa- 
tion takes place within a “black box,” of which the user need have no 
knowledge. Within the black box, SYMPHONY performs all the normal 
functions of branch and cut — tree management, LP solution, cut pool 
management, as well as inter-process communication (if parallelism is 
used). Source code and documentation for SYMPHONY are available at 
http : / /branchandcut . org/SYMPHONY 



1 Introduction 

Since the inception of optimization as a recognized field of study in mathematics, 
researchers have been both intrigued and stymied by the difficulty of solving 
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many of the most interesting classes of discrete optimization problems. Even 
combinatorial problems, though conceptually easy to model as integer programs, 
have long remained challenging to solve in practice. The last two decades have 
seen tremendous progress in our ability to solve large-scale discrete optimization 
problems. These advances have culminated in the approach that we now call 
branch and cut, a technique (see |45l72l5np which brings the computational 
tools of branch and bound algorithms together with the theoretical tools of 
polyhedral combinatorics. In 1998, Applegate, Bixby, Chvatal, and Cook used 
this technique to solve a Traveling Salesman Problem instance with 13,509 cities, 
a full order of magnitude larger than what had been possible just a decade earlier 
j2] and two orders of magnitude larger than the largest problem that had been 
solved up until 1978. This feat becomes even more impressive when one realizes 
that the number of variables in the standard formulation for this problem is 
approximately the square of the number of cities. Hence, we are talking about 
solving a problem with roughly 100 million variables. 

There are several reasons for this impressive progress. Perhaps the most im- 
portant is the dramatic increase in available computing power over the last 
decade, both in terms of processor speed and memory. This increase in the power 
of hardware has subsequently facilitated the development of increasingly sophis- 
ticated software for optimization, built on a wealth of theoretical results. As soft- 
ware development has become a central theme of optimization research efforts, 
many theoretical results have been “re-discovered” in light of their new-found 
computational importance. Finally, the use of parallel computing has allowed 
researchers to further leverage their gains. 

Because of the rapidly increasing sophistication of computational techniques, 
one of the main difficulties faced by researchers who wish to apply these tech- 
niques is the level of effort required to develop an efficient implementation. 
The inherent need for incorporating problem-dependent methods (most notably 
for dynamic generation of variables and constraints) has typically required the 
time-consuming development of custom implementations. Around 1993, this led 
to the development by two independent research groups of software libraries 
aimed at providing a generic framework that users could easily customize for 
use in a particular problem setting. One of these groups, headed by Jiinger 
and Thienel, eventually produced ABACUS (A Branch And CUt System) [52] 

[ — >■ Elf/Gutwenger/Jiinger/Rinaldi], while the other, headed by the authors 
and Ladanyi, produced what was then known as COMPSys (Combinatorial Opti- 
mization Multi-processing System) . After several revisions to enable more broad 
functionality, COMPSys became SYMPHONY (Single- or Multi-Process Opti- 
mization over Networks) [78l7b| . A version of SYMPHONY, which we will call 
COIN/BCP, has also been produced at IBM under the COIN-OR project [27] . 
The COIN/BCP package takes substantially the same approach and has the 
same functionality as SYMPHONY, but has extended SYMPHONY’S capabili- 
ties in some areas, as we will point out. 



These lecture notes are based on our experience over the last six years with 
implementing the SYMPHONY framework and using it to solve several clas- 
sical combinatorial optimization problems. At times, we will also draw on our 
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experience with the COIN/BCP framework mentioned earlier. What follows is 
intended to summarize some of what we and others have learned about imple- 
menting BCP algorithms and to provide a concise reference for those who wish 
to use branch and cut in their own research. 



2 Related Work 

The past decade has witnessed development of numerous software packages for 
discrete optimization, most of them based on the techniques of branch, cut, 
and price. The packages fell into two main categories — those based on general- 
purpose algorithms for solving mixed integer programs (MIPs) without the use 
of special structure [ — > Martin] and those facilitating the use of special struc- 
ture by interfacing with user-supplied, problem-specific subroutines. We will call 
packages in this second category frameworks. There have also been numerous 
special-purpose codes developed for use in particular problem settings. 

Of the two categories, MIP solvers are the most common. Among the many 
offerings in this category are MINTO [7D|, MIPO (TU], bc-opt [22], and SIP 
[ — > Martin] |67j . Generic frameworks, on the other hand, are far less numerous. 
The three frameworks we have already mentioned (SYMPHONY, ABACUS, 
and COIN/BCP) are the most full-featured packages available. Several others, 
such as MINTO, originated as MIP solvers but have the capability of utilizing 
problem-specific subroutines. CONCORDE I2EI, a package for solving the Trav- 
eling Salesman Problem (TSP), also deserves mention as the most sophisticated 
special-purpose code developed to date [ — >■ Applegate/Bixby/Chvatal/Cook]. 

Other related software includes several frameworks for implementing parallel 
branch and bound. Frameworks for general parallel branch and bound include 
PUBB gg, BoB [Tg, PPBB-Lib |87], and PICO gg. PARINO |6g and FAT- 
COP |2T] are parallel MIP solvers. 

3 Organization of the Manuscript 

In Sect. E] we briefly describe branch, cut, and price for those readers requiring 
a review of the basic methodology. In Sect. E] we describe the overall design 
of SYMPHONY without reference to implementational details and with only 
passing reference to parallelism. In Sect.jg we then move on to discuss details of 
the implementation. In Sect.[7l we touch on issues involved in parallelizing BCP. 
Finally, in Sect. |8]and Sect. we discuss our computational experience, with 
both sequential and parallel versions of the code. In these sections, we describe 
the implementation of solvers for two combinatorial optimization models, the 
Vehicle Routing Problem and the Set Partitioning Problem. We point out and 
explain those features and parameters that have been the most important. We 
also address the effectiveness of parallelism. 
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4 Introduction to Branch, Cut, and Price 

In the remainder of this document, we discuss the application of BCP algorithms 
to the solution of discrete optimization problems. A discrete optimization prob- 
lem (DOP) can be broadly defined as that of choosing from a finite set S an 
optimal element s* that minimizes some given objective function / : S' — ^ R (R 
will denote the set of all real numbers, and Z the set of all integers). DOPs arise 
in many important applications such as planning, scheduling, logistics, telecom- 
munications, bioengineering, robotics, and design of intelligent agents, among 
others. Most DOPs are in the complexity class AfP-complete, so there is little 
hope of finding provably efficient algorithms |40]. Nevertheless, intelligent search 
algorithms, such as LP-based branch and bound (to be described below), have 
been tremendously successful at tackling these difficult problems [I50J . 



4.1 Branch and Bound 

Branch and bound is the broad class of algorithms from which branch, cut, and 
price has evolved. A branch and bound algorithm uses a divide and conquer 
strategy to partition the solution space into subproblems and then optimizes 
individually over each subproblem. For instance, let S be the set of solutions to 
a given DOP, and let c S R‘® be a vector of costs associated with members of S. 
Suppose we wish to determine a least cost member of S and we are given s € S, a, 
“good” solution determined heuristically. Using branch and bound, we initially 
examine the entire solution space S. In the processing or bounding phase, we 
relax the problem. In so doing, we admit solutions that are not in the feasible 
set S. Solving this relaxation yields a lower bound on the value of an optimal 
solution. If the solution to this relaxation is a member of S or has cost equal to 
that of s, then we are done — either the new solution or s, respectively, is optimal. 
Otherwise, we identify n subsets S\, . . . , of S', such that Ur=i Each of 

these subsets is called a subproblem] Si, . . . , S„ are sometimes called the children 
of S. We add the children of S to the list of candidate subproblems (those which 
await processing). This is called branehing. 

To continue the algorithm, we select one of the candidate subproblems and 
process it. There are four possible results. If we find a feasible solution better 
than s, then we replace s with the new solution and continue. We may also 
find that the subproblem has no solutions, in which case we discard (prune) it. 
Otherwise, we compare the lower bound for the subproblem to our global upper 
bound, given by the value of the best feasible solution encountered thus far. If it 
is greater than or equal to our current upper bound, then we may again prune 
the subproblem. Finally, if we cannot prune the subproblem, we are forced to 
branch and add the children of this subproblem to the list of active candidates. 
We continue in this way until the list of candidate subproblems is empty, at 
which point our current best solution is, in fact, optimal. 
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4.2 Branch, Cut, and Price 

In many applications, the bounding operation is accomplished using the tools of 
linear programming (LP), a technique described in full generality, e.g., by Hoff- 
man and Padberg m- This general class of algorithms is known as LP -based 
branch and bound. Typically, the integrality constraints of an integer program- 
ming formulation of the problem are relaxed to obtain an LP relaxation, which 
is then solved to obtain a lower bound for the problem. In |^, Padberg and 
Rinaldi improved on this basic idea by describing a method of using globally 
valid inequalities (i.e., inequalities valid for the convex hull of integer solutions) 
to strengthen the LP relaxation. This technique was called branch and cut. Since 
then, many implementations (including ours) have been fashioned after the ideas 
they described for solving the Traveling Salesman Problem. 

As an example, let a combinatorial optimization problem CP = {E, iF) with 
ground set E and feasible set F C 2^ he given along with a cost function c G R^. 
The incidence vectors corresponding to the members of T are sometimes specified 
as the set of all incidence vectors obeying a (relatively) small set of inequalities. 
These inequalities are typically the ones used in the initial LP relaxation. Now 
let V be the convex hull of incidence vectors of members of T . Then we know 
by Weyl’s Theorem (see [H]) that there exists a finite set C of inequalities valid 
for V such that 

V = {x ■. ax < (i V (a, /3) e £} = {x e R” : Ax < b}. (1) 

The inequalities in £ are the potential constraints, or cutting planes, to be added 
to the relaxation as needed. Unfortunately, it is usually difficult, if not impos- 
sible, to enumerate all of the inequalities in £, else we could simply solve the 
problem using linear programming. Instead, they are defined implicitly and we 
use separation algorithms and heuristics to generate these inequalities when they 
are violated. In Fig. [T| we describe more precisely how the bounding operation 
is carried out in a branch and cut algorithm for combinatorial optimization. 

Once we have failed to either prune the current subproblem or separate the 
current relaxed solution from V, we are forced to branch. The branching opera- 
tion is usually accomplished by specifying a set of hyperplanes which divide the 
current subproblem in such a way that the current solution is not feasible for 
the LP relaxation of any of the new subproblems. For example, in a combinato- 
rial optimization problem, branching could be accomplished simply by fixing a 
variable whose current value is fractional to 0 in one branch and 1 in the other. 
The procedure is described more formally in Fig. [2l Figure [3] gives a high level 
description of the generic branch and cut algorithm. 

As with constraints, the columns of A can also be defined implicitly if n is 
large. If column i is not present in the current matrix, then variable Xi is implic- 
itly taken to have value zero. The process of dynamically generating variables 
is called pricing in the jargon of linear programming, but can also be viewed as 
that of generating constraints for the dual of the current LP relaxation. Hence, 
LP-based branch and bound algorithms in which the variables are generated 
dynamically are known as branch and price algorithms. In m, Barnhart et al. 
provide a thorough review of these methods. 
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Bounding Operation 

Input: A subproblem S, described in terms of a “small” set of inequalities C' such 
that S — {a;'’ ■. s £ T and ax“ < P ^ {a, P) & C'} and a, an upper bound on the 
global optimal value. 

Output: Either (1) an optimal solution s* € 5 to the subproblem, (2) a lower bound 
on the optimal value of the subproblem and the corresponding relaxed solution x, 
or (3) a message pruned indicating that the subproblem should not be considered 
further. 

Step 1. Set C C' . 

Step 2. If the LP min{cx : ar < /I V (a, /3) € C} is infeasible, then STOP and 
output pruned. This subproblem has no feasible solutions. 

Step 3. Otherwise, consider the LP solution x. If cx < a, then go to Step 4. 
Otherwise, STOP and output pruned. This subproblem cannot produce a solution 
of value better than a. 

Step 4. If X is the incidence vector of some s G 5, then s is the optimal solution 
to this subproblem. STOP and output s as s*. 

Step 5. Otherwise, apply separation algorithms and heuristics to x to obtain a 
set of violated inequalities C' . If C' — 0, then cx is a lower bound on the value of 
an optimal element of S. STOP and return x and the lower bound cx. 

Step 6. Otherwise, set C C U C' and go to Step 2. 



Fig. 1. Bounding in the branch and cut algorithm for combinatorial optimization 



Branching Operation 

Input: A subproblem S and x, the LP solution yielding the lower bound. 

Output: Si, . . . ,Sp such that 5 = 

Step 1. Determine sets Ci,... ,Cp of inequalities such that S — VJi^i{x € S : 
ax < py {a,P) G Ci} and x ^ 

Step 2. Set Si = {x £ S ■. ax < P {a, P) € £i U C'} where £' is the set of 
inequalities used to describe S. 



Fig. 2. Branching in the branch and cut algorithm 



When both variables and constraints are generated dynamically during LP- 
based branch and bound, the technique is known as branch, cut, and price (BCP). 
In such a scheme, there is a pleasing symmetry between the treatment of con- 
straints and that of variables. We further examine this symmetry later in these 
notes. For now, however, it is important to note that while branch, cut, and 
price does combine ideas from both branch and cut and branch and price (which 
themselves have many commonalities), combining the two techniques requires 
much more sophisticated methods than either requires alone. This is an impor- 
tant theme in what follows. 

In our descriptions, we will often use the term search tree. This term derives 
from the representation of the list of subproblems as the nodes of a graph in which 
each subproblem is connected only to its parent and its children. Storing the 
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Generic Branch and Cut Algorithm 

Input: A data array specifying the problem instance. 

Output: A global optimal solution s* to the problem instance. 

Step 1. Generate a “good” feasible solution s using heuristics. Set a •<— c(s). 
Step 2. Generate the first subproblem by constructing a small set L' of in- 
equalities valid for V. Set B <— {5^}. 

Step 3. If B = 0, STOP and output s as the global optimum s*. Otherwise, choose 
some S G B. Set B -4— B \ {5}. Apply the bounding procedure to S (see Fig. El). 
Step 4. If the result of Step 3 is a feasible solution s, then cs < cs. Set s 4— s and 
a 4— c(s) and go to Step 3. If the subproblem was pruned, go to Step 3. Otherwise, 
go to Step 5. 

Step 5. Perform the branching operation. Add the set of subproblems generated 
to A and go to Step 3. 



Fig. 3. Description of the generic branch and cut algorithm 



subproblems in such a form is an important aspect of our global data structures. 
Since the subproblems correspond to the nodes of this graph, they will sometimes 
be referred to as nodes in the search tree or simply as nodes. The root node or 
root of the tree is the node representing the initial subproblem. 

5 Design of SYMPHONY 

In the remainder of these notes, we will illustrate general principles applicable to 
implementing BCP by drawing on our experience with SYMPHONY. We thus 
begin with a high-level description of the framework. SYMPHONY was designed 
with two major goals in mind — ease of use and portability. With respect to ease 
of use, we aimed for a “black box” design, whereby the user would not be required 
to know anything about the implementation of the library, but only about the 
user interface. With respect to portability, we aimed not only for it to be possible 
to use the framework in a wide variety of settings and on a wide variety of 
hardware, but also for the framework to perform effectively in all these settings. 
Our primary measure of effectiveness was how well the framework would perform 
in comparison with a problem-specific (or hardware-specific) implementation 
written “from scratch.” 

The reader should be mindful of the fact that achieving such design goals 
involves a number of difficult tradeoffs, which we highlight throughout the rest of 
this text. For instance, ease of use is quite often at odds with efficiency. In many 
instances, we had to sacrifice some efficiency in order to make the code easy 
to work with and to maintain a true “black box” implementation. Maintaining 
portability across a wide variety of hardware, both sequential and parallel, also 
required some difficult choices. Sequential and shared-memory platforms demand 
memory-efficient data structures in order to maintain the very large search trees 
that can be generated. When moving to distributed platforms, these storage 
schemes do not scale well to large numbers of processors. This is further discussed 
in Sect. 17. II 
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5.1 An Object-Oriented Approach 

As we have already remarked, applying BCP to large-scale problems presents 
several difficult challenges. First and foremost is designing methods and data 
structures capable of handling the potentially huge numbers of constraints and 
variables that need to be accounted for during the solution process. The dynamic 
nature of the algorithm requires that we must also be able to efficiently move 
constraints and variables in and out of the active set of each search node at 
any time. A second, closely-related challenge is that of effectively dealing with 
the very large search trees that can be generated for difficult problem instances. 
This involves not only the important question of how to store the data, but also 
how to move it between modules during parallel execution. A final challenge 
in developing a generic framework, such as SYMPHONY, is to deal with these 
issues using a problem-independent approach. 

Describing a node in the search tree consists of, among other things, speci- 
fying which constraints and variables are initially active in the subproblem. In 
fact, the vast majority of the methods in BCP that depend on the model are 
related to generating, manipulating, and storing the constraints and variables. 
Hence, SYMPHONY can be considered an object-oriented framework with the 
central “objects” being the constraints and variables. From the user’s perspec- 
tive, implementing a BCP algorithm using SYMPHONY consists primarily of 
specifying various properties of objects, such as how they are generated, how 
they are represented, and how they should be realized within the context of a 
particular subproblem. 

With this approach, we achieved the “black box” structure by separating 
these problem-specific functions from the rest of the implementation. The in- 
ternal library interfaces with the user’s subroutines through a well-defined Ap- 
plication Program Interface (API) and independently performs all the normal 
functions of BCP — tree management, LP solution, and cut pool management, as 
well as inter-process communication (when parallelism is employed). Although 
there are default options for many of the operations, the user can also assert 
control over the behavior of the algorithm by overriding the default methods or 
by manipulating the parameters. 

Although we have described our approach as being “object-oriented,” we 
would like to point out that SYMPHONY is implemented in C, not C-|— 1-. To 
avoid inefficiencies and enhance the modularity of the code (allowing for easy 
parallelization), we used a more “function-oriented” approach for the implemen- 
tation of certain aspects of the framework. For instance, methods used for com- 
municating data between modules are not naturally “object-oriented” because 
the type of data being communicated is usually not known by the message- 
passing interface. It is also common that efficiency considerations require that a 
particular method be performed on a whole set of objects at once rather than 
on just a single object. Simply invoking the same method sequentially on each of 
the members of the set can be inefficient. In these cases, it is far better to define 
a method which operates on the whole set at once. In order to overcome these 
problems, we have also defined a set of interface functions, which are associated 
with the computational modules ISect. 
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5.2 Data Structures and Storage 

Both the memory required to store the search tree and the time required to 
process a node are largely dependent on the number of objects (constraints and 
variables) active in each subproblem. Keeping this active set as small as possible 
is one of the keys to efficiently implementing BCP. For this reason, we chose data 
structures that enhance our ability to efficiently move objects in and out of the 
active set. Allowing sets of constraints and variables to move in and out of the 
linear programs simultaneously is one of the most significant challenges of BCP. 
We do this by maintaining an abstract representation of each global object that 
contains information about how to add it to a particular LP relaxation. 

In the literature on linear and integer programming, the terms constraint 
and row are often used interchangeably. Similarly, variable and column are often 
used with the same meaning. In many situations, this is appropriate and does 
not cause confusion. However, in object-oriented BCP frameworks, such as SYM- 
PHONY or ABACUS |52| . a constraint and a row are fundamentally different 
objects. A constraint (also referred to as a cut) is a user-defined representation 
of an abstract object which can only be realized as a row in an LP matrix with 
respect to a particular set of active variables. Similarly, a variable is a represen- 
tation which can only be realized as a column of an LP matrix with respect to 
a particular set of constraints. This distinction between the representation and 
the realization of objects is a crucial design element that allows us to effectively 
address some of the challenges inherent in BCP. In the remainder of this section, 
we further discuss this distinction and its implications. 



Variables. In SYMPHONY, problem variables are represented by a unique 
global index assigned to each variable by the user. This index indicates each 
variable’s position in a “virtual” global list known only to the user. The main 
requirement of this indexing scheme is that, given an index and a list of active 
constraints, the user must be able to generate the corresponding column to be 
added to the matrix. As an example, in problems where the variables corre- 
spond to the edges of an underlying graph, the index could be derived from a 
lexicographic ordering of the edges (when viewed as ordered pairs of nodes). 

This indexing scheme provides a very compact representation, as well as a 
simple and effective means of moving variables in and out of the active set. 
However, it means that the user must have a priori knowledge of all problem 
variables and a method for indexing them. For combinatorial models such as 
the Traveling Salesman Problem, this does not present a problem. However, for 
other models such as airline crew scheduling (discussed below), for instance, the 
number of columns may not be known in advance. Even if the number of columns 
is known in advance, a viable indexing scheme may not be evident. Eliminating 
the indexing requirement by allowing variables to have abstract, user-defined 
representations (such as we do for constraints, as described in the next section), 
would allow for more generality, but would also sacrifice some efficiency. A hybrid 
scheme, allowing the user to have both indexed and algorithmic variables (vari- 
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ables with user-defined representations) has been implemented in COIN/BCP 
and is also planned for a future version of SYMPHONY. 

For efficiency, the problem variables can be divided into two sets, the core 
variables and the extra variables. The core variables are active in all subprob- 
lems, whereas the extra variables can be freely added and removed. There is 
no theoretical difference between core variables and extra variables; however, 
designating a well-chosen set of core variables can significantly increase effi- 
ciency. Because they can move in and out of the problem, maintaining extra 
variables requires additional bookkeeping and computation. If the user has rea- 
son to believe a priori that a variable has a high probability of having a non-zero 
value in some optimal solution to the problem, then that variable should be 
designated as a core variable. Core variables selection in the case of the Vehi- 
cle Routing Problem will be illustrated in Sect. 18.11 For a detailed description 
of core variable selection in the case of the Traveling Salesman Problem, see 
[ — >■ Elf/Gutwenger/Junger/Rinaldi]. In addition to the core variables, the user 
can also designate other variables that should be active in the root subproblem. 
Often, it is useful to activate these variables in the root, as it is likely they will 
be priced out quickly anyway. When not using column generation, all variables 
must be active in the root node. 



Constraints. Because the global list of potential constraints is not usually 
known a priori or is extremely large, constraints cannot generally be represented 
simply by a user-assigned index. Instead, each constraint is assigned a global 
index only after it becomes active in some subproblem. It is up to the user, if 
desired, to designate a compact representation for each class of constraints that is 
to be generated and to implement subroutines for converting from this compact 
representation to a matrix row, given the list of active variables. For instance, 
suppose that the set of variables with nonzero coefficients in a particular class 
of constraints corresponds to the set of edges across a cut in a graph. Instead 
of storing the index of each variable and its corresponding coefficient explicitly, 
one can simply store the set of nodes on one side (“shore”) of the cut as a bit 
array. The constraint can then be constructed easily for any particular set of 
active variables (see Sect. ED for more on this example). 

Just as with variables, the constraints are divided into core constraints and 
extra eonstraints. The core constraints are those that are active in every subprob- 
lem, whereas the extra constraints can be generated dynamically and are free 
to enter and leave as appropriate. The set of core constraints must be known 
and constructed explicitly by the user. Extra constraints, on the other hand, 
are generated dynamically by the cut generator as they are violated. As with 
variables, a good set of core constraints can have a significant effect on efficiency. 

Note that the user is not required to designate a compact representation 
scheme. Constraints can simply be represented explicitly as matrix rows with 
respect to the global set of variables. However, designating a compact form can 
result in large reductions in memory use if the number of variables in the problem 
is large. 
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Search Tree. Having described the basics of how objects are represented, we 
now describe the representation of search tree nodes. Since the core constraints 
and variables are present in every subproblem, only the indices of the extra 
constraints and variables are stored in each node’s description. A critical aspect 
of implementing BCP is the maintenance of a complete description of the current 
basis (assuming a simplex-based LP solver) for each node to allow a warm start 
to the computation. This basis is either inherited from the parent, computed 
during strong branching (see Sect. Ifi.‘2h . or comes from earlier partial processing 
of the node itself (see Sect. Iti.3|l . Along with the set of active objects, we must 
also store the identity of the object which was branched upon to generate the 
node. The branching operation is described in Sect. 16.21 

Because the set of active objects and the status of the basis do not tend to 
change much from parent to child, all of these data are stored as differences with 
respect to the parent when that description is smaller than the explicit one. This 
method of storing the entire tree is highly memory-efficient. The list of nodes 
that are candidates for processing is stored in a heap ordered by a comparison 
function defined by the search strategy f see 16.31 1. This allows efficient generation 
of the next node to be processed. 



5.3 Modular Implementation 

symphony’s functions are grouped into five independent computational mod- 
ules. This modular implementation not only facilitates code maintenance, but 
also allows easy and highly configurable parallelization. Depending on the com- 
putational setting, the modules can be compiled as either (1) a single sequential 
code, (2) a multi-threaded shared-memory parallel code, or (3) separate pro- 
cesses running over a distributed network. The modules pass data to each other 
either through shared memory (in the case of sequential computation or shared- 
memory parallelism) or through a message-passing protocol defined in a separate 
communications API (in the case of distributed execution) . A schematic overview 
of the modules is presented in Fig. [H In the remainder of the section, we de- 
scribe the modularization scheme and the implementation of each module in a 
sequential environment. We defer serious discussion of issues involved in parallel 
execution of the code until Sect. |7] 



The Master Module. The master module includes functions that perform 
problem initialization and I/O. These functions implement the following tasks: 

— Read in the parameters from a data file. 

— Read in the data for the problem instance. 

— Compute an initial upper bound using heuristics. 

— Perform problem preprocessing. 

— Initialize the BCP algorithm by sending data for the root node to the tree 
manager. 

— Initialize output devices and act as a central repository for output. 

— Process requests for problem data. 
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The Modules of Branch, Cut, and Price 




Fig. 4. Schematic overview of the branch, cut, and price algorithm 



— Receive new solutions and store the best one. 

— Receive the message that the algorithm has finished and print out data. 

— Ensure that all modules are still functioning. 
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The Tree Manager Module. The tree manager controls the overall execution 
of the algorithm. It tracks the status of all modules, as well as that of the search 
tree, and distributes the subproblems to be processed to the LP module(s). 
Functions performed by the tree manager module are: 

— Receive data for the root node and place it on the list of candidates for 
processing. 

— Receive data for subproblems to be held for later processing. 

— Handle requests from linear programming modules to release a subproblem 
for processing. 

— Receive branching object information, set up data structures for the children, 
and add them to the list of candidate subproblems. 

— Keep track of the global upper bound and notify all LP modules when it 
changes. 

— Write current state information out to disk periodically to allow a restart in 
the event of a system crash. 

— Keep track of run data and send it to the master program at termination. 



The Linear Programming Module. The linear programming (LP) module 
is the most complex and computationally intensive of the five modules. Its job is 
to use linear programming to perform the bounding and branching operations. 
These operations are, of course, central to the performance of the algorithm. 
Functions performed by the LP module are: 

— Inform the tree manager when a new subproblem is needed. 

— Receive a subproblem and process it in conjunction with the cut generator 
and the cut pool. 

— Decide which cuts should be sent to the global pool to be made available to 
other LP modules. 

— If necessary, choose a branching object and send its description back to the 
tree manager. 

— Perform the fathoming operation, including generating variables. 



The Cut Generator Module. The cut generator periorms only one function — 
generating valid inequalities violated by the current LP solution and sending 
them back to the requesting LP module. Here are the functions performed by 
the cut generator module: 

— Receive an LP solution and attempt to separate it from the convex hull of 
all solutions. 

— Send generated valid inequalities back to the LP solver. 

— When finished processing a solution vector, inform the LP not to expect any 
more cuts in case it is still waiting. 
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The Cut Pool Module. The concept of a cut pool was first suggested by 
Padberg and Rinaldi m, and is based on the observation that in BCP, the 
inequalities which are generated while processing a particular node in the search 
tree are generally globally valid and potentially useful at other nodes. Since 
generating these cuts is sometimes a relatively expensive operation, the cut pool 
maintains a list of the “best” or “strongest” cuts found in the tree thus far for use 
in processing future subproblems. Hence, the cut pool functions as an auxiliary 
cut generator. More explicitly, the functions of the cut pool module are: 

— Receive cuts generated by other modules and store them. 

— Receive an LP solution and return a set of cuts which this solution violates. 

— Periodically purge “ineffective” and duplicate cuts to control its size. 



5.4 SYMPHONY Overview 

Currently, SYMPHONY is a single-pool BCP algorithm. The term single-pool 
refers to the fact that there is a single central list of candidate subproblems to 
be processed, which is maintained by the tree manager. Most sequential imple- 
mentations use such a single-pool scheme. However, other schemes may be used 
in parallel implementations. For a description of various types of parallel branch 
and bound, see 031 . 

The master module begins by reading in the parameters and problem data. 
After initial I/O is completed, subroutines for finding an initial upper bound and 
constructing the root node are executed. During construction of the root node, 
the user must designate the initial set of active cuts and variables, after which 
the data for the root node are sent to the tree manager to initialize the list of 
candidate nodes. The tree manager in turn sets up the cut pool module(s), the 
linear programming module(s), and the cut generator module(s). All LP modules 
are marked as idle. The algorithm is now ready for execution. 

In the steady state, the tree manager controls the execution by maintaining 
the list of candidate subproblems and sending them to the LP modules as they 
become idle. The LP modules receive nodes from the tree manager, process 
them, branch (if required), and send back the identity of the chosen branching 
object to the tree manager, which in turn generates the children and places them 
on the list of candidates to be processed (see Sect. 16.21 for a description of the 
branching operation). The preference ordering for processing nodes is a run-time 
parameter. Typically, the node with the smallest lower bound is chosen to be 
processed next, since this “best-first” strategy minimizes the overall size of the 
search tree. However, at times it will be advantageous to dive down in the tree. 
The concepts of diving and search chains^ introduced in Sect. 16.31 extend the 
basic best-first approach. 

We mentioned earlier that cuts and variables can be treated in a somewhat 
symmetric fashion. However, it should be clear by now that our current im- 
plementation favors branch and cut algorithms, where the computational ef- 
fort spent generating cuts dominates that of generating variables. Our methods 
of representation also clearly favor such problems. In a future version of the 
software, we plan to eliminate this bias by adding additional functionality for 
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handling variable generation and storage. This is the approach already taken 
in COIN/BCP |27]. For more discussion of the reasons for this bias and the 
differences between the treatment of cuts and variables, see Sect. fOl 

6 Details of the Implementation 

6.1 The Master Module 

The primary functions performed by the master module were listed in Sect . 15.31 
If needed, the user must provide a routine to read problem-specific parameters in 
from a parameter file. Also suggested is a subroutine for upper bounding, though 
upper bounds can also be provided explicitly. A good initial upper bound can 
dramatically decrease the solution time by allowing more variable-fixing (see 
Sect. 16.21 and also [ — >■ Elf/Gutwenger/Jiinger/Rinaldi]) and earlier pruning of 
search tree nodes. If no upper bounding subroutine is available, then the two- 
phase algorithm, in which a good upper bound is found quickly in the first 
phase using a reduced set of variables, can be useful (see Sect. 16.31 for details). 
The user’s only unavoidable obligation during preprocessing is to specify the list 
of core variables and, if desired, the list of extra variables that are to be active in 
the root node. Again, we point out that selecting a good set of core variables can 
make a marked difference in solution speed, especially when using the two-phase 
algorithm. 



6.2 The Linear Programming Module 

The LP module is at the core of the algorithm, as it performs the computationally 
intensive bounding operations for each subproblem. A schematic diagram of the 
LP solver loop is presented in Fig. |5] The details of the implementation are 
discussed in the following sections. 



The Linear Programming Engine. SYMPHONY requires the use of a third- 
party callable library (referred to as the LP engine or LP library) to solve the LP 
relaxations once they are formulated. As with the user functions, SYMPHONY 
communicates with the LP engine through an API that converts SYMPHONY’S 
internal data structures into those of the LP engine. Currently, the framework 
will only work with advanced, simplex-based LP engines, such as CPLEX [49], 
since the LP engine must be able to accept an advanced basis, and provide a 
variety of data to the framework during the solution process. The internal data 
structures used for maintaining the LP relaxations are similar to those of CPLEX 
and matrices are stored in the standard column-ordered format. 



Managing the LP Relaxation. The majority of the computational effort of 
BCP is spent solving LPs and hence a major emphasis in the development was 
to make this process as efficient as possible. Besides using a good LP engine, the 
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Fig. 5. Overview of the LP solver loop 



primary way in which this is done is by controlling the size of each relaxation, 
both in terms of number of active variables and number of active constraints. 

The number of constraints is controlled through use of a local pool and 
through purging of ineffective constraints. When a cut is generated by the cut 
generator, it is first sent to the local cut pool. In each iteration, up to a spec- 
ified number of the strongest cuts (measured by degree of violation) from the 
local pool are added to the problem. Cuts that are not strong enough to be 
added to the relaxation are eventually purged from the list. In addition, cuts are 
purged from the LP itself when they have been deemed ineffective for more than 
a specified number of iterations, where ineffective is defined as either (1) the 
corresponding slack variable is positive, (2) the corresponding slack variable is 
basic, or (3) the dual value corresponding to the row is zero (or very small). Cuts 
that have remained effective in the LP for a specified number of iterations are 
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sent to the global pool where they can be used in later search nodes. Cuts that 
have been purged from the LP can be made active again if they later become 
violated. 

The number of variables (columns) in the relaxation is controlled through 
reduced cost fixing and dynamic column generation. Periodically, each active 
variable is priced to see if it can be fixed by reduced cost. That is, the LP reduced 
cost is examined in an effort to determine whether fixing that variable at one 
of its bounds would remove improving solutions; if not, the variable is fixed and 
removed from consideration. For a more detailed description of the conditions for 
fixing and setting by reduced cost, see [ — >■ Elf/Gutwenger/Jiinger/Rinaldi]. If 
the matrix is full at the time of the fixing, meaning that all unfixed variables are 
active, then the fixing is permanent for that subtree. Otherwise, it is temporary 
and only remains in force until the next time that columns are dynamically 
generated. 

Because SYMPHONY was originally designed for combinatorial problems 
with relatively small numbers of variables, techniques for performing dynamic 
column generation are somewhat unrefined. Currently, variables are priced out 
sequentially by index, which can be costly. To improve the process of pricing 
variables, we plan to increase the symmetry between our methods for handling 
variables and those for handling cuts. This includes (1) allowing user-defined, ab- 
stract representations for variables, (2) allowing the use of “variable generators” 
analogous to cut generators, (3) implementing both global and local pools for 
variables, (4) implementing heuristics that help determine the order in which the 
indexed variables should be priced, and (5) allowing for methods of simultane- 
ously pricing out large groups of variables. Much of this is already implemented 
in COIN/BCP. 

Because pricing is computationally burdensome, it currently takes place only 
either (1) before branching (optional), or (2) when a node is about to be pruned 
(depending on the phase — see the description of the two-phase algorithm in Sect. 
16.31) . To use dynamic column generation, the user must supply a subroutine which 
generates the column corresponding to a particular user index, given the list of 
active constraints in the current relaxation. When column generation occurs, 
each column not currently active that has not been previously fixed by reduced 
cost is either priced out immediately, or becomes active in the current relaxation. 
Only a specified number of columns may enter the problem at a time, so when 
that limit is reached, column generation ceases. For further discussion of column 
generation, see Sect. 16. 3L where the two-phase algorithm is described. 

Since the matrix is stored in compressed form, considerable computation 
may be needed to add and remove rows and columns. Hence, rows and columns 
are only physically removed from the problem when there are sufficiently many 
to make it “worthwhile.” Otherwise, deleted rows and columns remain in the 
matrix but are simply ignored by the computation. Note that because ineffective 
rows left in the matrix increase the size of the basis unnecessarily, it is usually 
advisable to adopt an aggressive strategy for row removal. 
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Branching. Branching takes place whenever either (1) both cut generation and 
column generation (if performed) have failed; (2) “tailing off” in the objective 
function value has been detected (see [ — >■ Elf/Gutwenger/Jiinger/Rinaldi] for a 
description of tailing off); or (3) the user chooses to force branching. Branching 
can take place on cuts or variables and can be fully automated or fully controlled 
by the user, as desired. Branching can result in as many children as the user 
desires, though two is typical. Once it is decided that branching will occur, the 
user must either select the list of candidates for strong branching (see below 
for the procedure) or allow SYMPHONY to do so automatically by using one 
of several built-in strategies, such as branching on the variable whose value is 
farthest from being integral. The number of candidates may depend on the level 
of the current node in the tree — it is usually best to expend more effort on 
branching near the top of the tree. 

After the list of candidates is selected, each candidate is pre-solved, by per- 
forming a specified number of iterations of the dual simplex algorithm in each 
of the resulting subproblems. Based on the objective function values obtained in 
each of the potential children, the final branching object is selected, again either 
by the user or by built-in rule. This procedure of using exploratory LP informa- 
tion in this manner to select a branching candidate is commonly referred to as 
strong branching. When the branching object has been selected, the LP module 
sends a description of that object to the tree manager, which then creates the 
children and adds them to the list of candidate nodes. It is then up to the tree 
manager to specify which node the now-idle LP module should process next. 
This issue is further discussed below. 



6.3 The Tree Manager Module 

The tree manager’s primary job is to control the execution of the algorithm by 
deciding which candidate node should be chosen as the next to be processed. 
This is done using either one of several built-in rules or a user-defined rule. 
Usually, the goal of the search strategy is to minimize overall running time, 
but it is sometimes also important to find good feasible solutions early in the 
search process. In general, there are two ways to decrease running time — either 
by decreasing the size of the search tree or by decreasing the time needed to 
process each search tree node. 

To minimize the size of the search tree, the strategy is to select consistently 
that candidate node with the smallest associated lower bound. In theory, this 
strategy, sometimes called best-first, will lead the smallest possible search tree. 
However, we need to consider the time required to process each search tree 
node as well. This is affected by both the quality of the current upper bound 
and by such factors as communication overhead and node set-up costs. When 
considering these additional factors, it will sometimes be more effective to deviate 
from the best-first search order. We discuss the importance of such strategies 
below. 




Branch, Cut, and Price: Sequential and Parallel 



241 



Search Chains and Diving. One reason for not strictly enforcing the search 
order is because it is somewhat expensive to construct a search node, send it 
to the LP solver, and set it up for processing. If, after branching, we choose to 
continue processing one of the children of the current subproblem, we avoid the 
set-up cost, as well as the cost of communicating the node description of the 
retained child subproblem back to the tree manager. This is called diving and 
the resulting chain of nodes is called a search chain. There are a number of rules 
for deciding when an LP module should be allowed to dive. One such rule is to 
look at the number of variables in the current LP solution that have fractional 
values. When this number is low, there may be a good chance of finding a feasible 
integer solution quickly by diving. This rule has the advantage of not requiring 
any global information. We also dive if one of the children is “close” to being 
the best node, where “close” is defined by a chosen parameter. 

In addition to the time saved by avoiding reconstruction of the LP in the 
child, diving has the advantage of often leading quickly to the discovery of feasible 
solutions, as discussed above. Good upper bounds not only allow earlier pruning 
of unpromising search chains, but also should decrease the time needed to process 
each search tree node by allowing variables to be fixed by reduced cost. 



The Two-Phase Algorithm. If no heuristic subroutine is available for gener- 
ating feasible solutions quickly, then a unique two-phase algorithm can also be 
invoked. In the two-phase method, the algorithm is first run to completion on 
a specified set of core variables. Any node that would have been pruned in the 
first phase is instead sent to a pool of candidates for the second phase. If the 
set of core variables is small, but well-chosen, this first phase should be finished 
quickly and should result in a near-optimal solution. In addition, the first phase 
will produce a list of useful cuts. Using the upper bound and the list of cuts 
from the first phase, the root node is repriced — that is, it is reprocessed with 
the full set of variables and cuts. The hope is that most or all of the variables 
not included in the first phase will be priced out of the problem in the new root 
node. Any variable thus priced out can be eliminated from the problem globally. 
If we are successful at pricing out all of the inactive variables, we have shown 
that the solution from the first phase was, in fact, optimal. If not, we must go 
back and price out the (reduced) set of extra variables in each leaf of the search 
tree produced during the first phase. We then continue processing any node in 
which we fail to price out all the variables. 

In order to avoid pricing variables in every leaf of the tree, we can trim the tree 
before the start of the second phase. Trimming the tree consists of eliminating 
the children of any node for which each child has lower bound above the current 
upper bound. We then reprocess the parent node itself. This is typically more 
efficient, since there is a high probability that, given the new upper bound and 
cuts, we will be able to prune the parent node and avoid the task of processing 
each child individually. 
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6.4 The Cut Generator Module 

To implement the cut generator module, the user must provide a function that 
accepts an LP solution and returns cuts violated by that solution to the LP 
module. In parallel configurations, each cut is returned immediately to the LP 
module, rather than being returned within a group of cuts when the function 
terminates. This allows the LP to begin adding cuts and re-solving the current 
relaxation before the cut generator is finished, if desired. Parameters controlling 
if and when the LP should begin solving the new relaxation before the cut 
generator is finished can be set by the user. 

6.5 The Cut Pool Module 

Maintaining and Scanning the Pool. The cut pool’s primary job is to receive 
a solution from an LP module and return cuts from the pool that are violated 
by it. The cuts are stored along with two pieces of information — the level of the 
tree on which the cut was generated, known simply as the level of the cut, and 
the number of times it has been checked for violation since the last time it was 
actually found to be violated, known as the number of touches. The number 
of touches can be used as a simplistic measure of its effectiveness. Since the 
pool can get quite large, the user can choose to scan only cuts whose number of 
touches is below a specified threshold and/or cuts that were generated on a level 
at or above the current one in the tree. The idea behind this second criterion is 
to try to avoid checking cuts that were not generated “nearby” in the tree, as 
they are less likely to be effective. Any cut generated at a level in the tree below 
the level of the current node must have been generated in a different part of 
the tree. Although this is admittedly a naive method, it has proven reasonably 
effective in practice. 

On the other hand, the user may define a specific measure of quality for each 
cut to be used instead. For example, the degree of violation is an obvious candi- 
date. This measure of quality must be computed by the user, since the cut pool 
module has no knowledge of the cut data structures. The quality is recomputed 
every time the user checks the cut for violation and a running average is used as 
the global quality measure. The cuts in the pool are periodically sorted by this 
measure and only the highest quality cuts are checked each time. All duplicate 
cuts, as well as all cuts whose number of touches exceeds or whose quality falls 
below specified thresholds, are periodically purged from the pool in order to 
limit computational effort. 



Using Multiple Pools. For several reasons, it may be desirable to have multi- 
ple cut pools. When there are multiple cut pools, each pool is initially assigned 
to a particular node in the search tree. After being assigned to that node, the 
pool services requests for cuts from that node and all of its descendants until 
such time as one of its descendants is assigned to another cut pool. After that, it 
continues to serve all the descendants of its assigned node that are not assigned 
to other pools. 
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Initially, the first pool is assigned to the root node. All other pools are unas- 
signed. During execution, when a new node is selected for processing, the tree 
manager must determine which pool will service the node. The default is to as- 
sign the same pool as that of its parent. However, if there is currently an idle 
pool (either it has never been assigned to any node or all the descendants of 
its assigned node have been processed or reassigned), then that cut pool can be 
assigned to the new node. The new pool is initialized with all the cuts currently 
in the cut pool of the parent node, after which the two pools operate indepen- 
dently on their respective subtrees. When generating cuts, the LP module sends 
the new cuts to the cut pool assigned to service the node during whose processing 
the cuts were generated. 

The primary motivation behind the idea of multiple cut pools is as follows. 
First, we want simply to limit the size of each pool as much as possible. By 
limiting the number of nodes that a cut pool has to service, the number of cuts 
in the pool will be similarly limited. This not only allows cut storage to be spread 
over multiple machines, and hence increases the available memory, but at the 
same time, the efficiency with which the cut pool can be scanned for violated 
cuts is also increased. A secondary reason for maintaining multiple cut pools is 
that it allows us to limit the scanning of cuts to only those that were generated 
in the same subtree. As described above, this helps focus the search and should 
increase the efficiency and effectiveness of the search. This idea also allows us to 
generate locally valid cuts, such as the classical Gomory cuts (see |7T]1. 

7 Parallelizing BCP 

Because of the clear partitioning of work that occurs when the branching opera- 
tion generates new subproblems, branch and bound algorithms lend themselves 
well to parallelization. As a result, there is already a significant body of research 
on performing branch and bound in parallel environments. We again refer the 
reader to the survey of parallel branch and bound algorithms by Gendron and 
Grainic [l^, as well as other references such as |85I46I80I57| . 

In parallel BGP, as in general branch and bound, there are two major sources 
of parallelism. First, it is clear that any group of subproblems on the current 
candidate list can be processed simultaneously. Once a subproblem has been 
added to the list, it can be properly processed before, during, or after the pro- 
cessing of any other subproblem. This is not to say that processing a particular 
node at a different point in the algorithm won’t produce different results — it 
most certainly will — but the algorithm will terminate correctly in any case. The 
second major source of parallelism is to parallelize the processing of individual 
subproblems. For instance, by allowing separation to be performed in parallel 
with the solution of the linear programs, we can theoretically process a node in 
little more than the amount of time it takes to perform the more expensive of 
these two operations. Alternatively, it is also possible to separate over several 
classes of cuts simultaneously. However, computational experience has shown 
that savings from parallelizing cut generation are difficult to achieve at best. 
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Nonetheless, both of these sources of parallelism can be easily exploited using 
the SYMPHONY framework. 

The most straightforward parallel implementation, the one we currently em- 
ploy, is a master-slave model, in which there is a central manager responsible 
for partitioning the work and parceling it out to the various slave processes that 
perform the actual computation. This approach was adopted because it allows 
memory-efficient data structures for sequential computation and yet is concep- 
tually easy to parallelize. Unfortunately, this approach has limited scalability. 
We discuss design tradeoffs involving scalability in the next section. 

7.1 Scalability 

Overview of Scalability. We now digress slightly to discuss the importance of 
scalability in parallel algorithm development. Generally speaking, the scalability 
of a parallel system (the combination of a parallel algorithm and a parallel 
architecture) is the degree to which it is capable of efficiently utilizing increased 
computing resources (usually additional processors). To assess this capability, 
we compare the speed with which we can solve a particular problem instance in 
parallel to that with which we could solve it on a single processor. The sequential 
running time (Tq) is used as the basis for comparison and is usually taken to be 
the running time of the best available sequential algorithm. The parallel running 
time (Tp) is the running time of the parallel algorithm in question and depends 
on p, the number of processors available. The speedup (Sp) is the simply the ratio 
To/Tp and hence also depends on p. Finally, the efficiency (Ep) is the ratio Sp/p 
of speedup to number of processors. 

In general, if the problem size is kept constant, efficiency drops as the number 
of processors increases — this is a product of the fact that there is a fixed fraction 
of work that is inherently sequential in nature (reading in the problem data, for 
example). This sequential fraction limits the theoretical maximum speedup (see 
II])- However, if the number of processors is kept constant, then efficiency gener- 
ally increases as problem size increases 158146 1471 . This is because the sequential 
fraction becomes smaller as problem size increases. Thus, we generally define 
scalability in terms of the rate at which the problem size must be increased with 
respect to the number of processors in order to maintain a fixed efficiency. For 
more details, see [57] . 

Scalability for BCP. In order to maintain high parallel efficiency, it is critical 
not only to keep each processor busy, but to keep each processor busy with use- 
ful work. Hence, as in m, we differentiate between two different notions of load 
balancing — quantitative load balancing and qualitative load balancing. Quanti- 
tative load balancing consists of ensuring that the amount of work allocated to 
each processor is approximately equal. Qualitative load balancing, on the other 
hand, consists of ensuring not only that each processor has enough work to do, 
but also that each processor has high-quality work to do. 

The use of a single central tree manager has the advantage of making load bal- 
ancing easy. Whenever a processor runs out of work, the tree manager can simply 
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issue more. Furthermore, it can easily issue the “best” work that is available at 
that time, usually the subproblem with the least lower bound. Unfortunately, 
the central tree manager becomes a computational bottleneck when large num- 
bers of slave processes are employed. The degree to which this occurs is highly 
dependent on the problem setting. If each search tree node requires significant 
processing time (and hence the tree is not growing too quickly), then scalability 
may not be much of an issue. For problems in which quick enumeration of a 
large search tree is the primary computational approach, scalability will suffer. 

This problem has been studied extensively for general branch and bound and 
various approaches to “decentralization” have been suggested to relieve the bot- 
tleneck at the tree manager. However, while these approaches are more scalable, 
they appear to be inefficient when the numbers of processors is small, at least for 
our purposes. Moreover, they do not allow the use of our differencing scheme for 
storing the entire tree efficiently at a single processor. The straightforward im- 
plementation of such a scheme may, therefore, sacrifice our ability to solve large 
problems sequentially. Furthermore, fault tolerance could also be decreased (see 
Sect. [7:^. It’s in view of these considerations that we employ the master-slave 
model. See Sect. II D.TI for a discussion of future improvements to scalability. 

7.2 Details of the Parallel Implementation 

Parallel Configurations. SYMPHONY supports numerous configurations, 
ranging from completely sequential to fully parallel, allowing efficient execution 
in many different computational settings. As described in the previous section, 
there are five modules in the standard distributed configuration. Various subsets 
of these modules can be combined to form separate executables capable of com- 
municating with each other across a network. When two or more modules are 
combined, they simply communicate through shared-memory instead of through 
message-passing. However, they are also forced to run in sequential fashion in this 
case, unless the user chooses to enable threading using an OpenMP compliant 
compiler (see next section). 

As an example, the default distributed configuration includes a separate ex- 
ecutable for each module type, allowing full parallelism. However, if cut gener- 
ation is fast and not memory-intensive, it may not be worthwhile to have the 
LP solver and its associated cut generator work independently, as this increases 
communication overhead without much potential benefit. In this case, the cut 
generator functions can be called directly from the LP solver, creating a single, 
more efficient executable. 

Inter-process Communication. SYMPHONY can utilize any third-party 
communication protocol supporting basic message-passing functions. All com- 
munication subroutines interface with SYMPHONY through a separate commu- 
nications API. Currently, PVM is the only message-passing protocol supported, 
but interfacing with another protocol is a straightforward exercise. 

Additionally, it is possible to configure the code to run in parallel using 
threading to process multiple search tree nodes simultaneously. Currently, this 
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is implemented using OpenMP compiler directives to specify the parallel regions 
of the code and perform memory locking functions. Compiling the code with 
an OpenMP compliant compiler will result in a shared-memory parallel exe- 
cutable. For a list of OpenMP compliant compilers and other resources, visit 
http : //www . openmp . org 



Fault Tolerance. Fault tolerance is an important consideration for solving 
large problems on computing networks whose nodes may fail unpredictably. The 
tree manager tracks the status of all processes and can restart them as necessary. 
Since the state of the entire tree is known at all times, the most that will be 
lost if an LP process or cut generator process is killed is the work that had been 
completed on that particular search node. To protect against the tree manager 
itself or a cut pool being killed, full logging capabilities have been implemented. 
If desired, the tree manager can write out the entire state of the tree to disk 
periodically, allowing a restart if a fault occurs. Similarly, the cut pool process 
can be restarted from a log file. This not only allows for fault tolerance but also 
for full reconfiguration in the middle of solving a long-running problem. Such 
reconfiguration could consist of anything from adding more processors to moving 
the entire solution process to another network. 



8 Applications 

To make the ideas discussed thus far more concrete, we now introduce two prac- 
tical applications from combinatorial optimization with which many readers will 
already be familiar. Graph-based problems, especially those involving packing 
and routing constraints, lend themselves particularly well to implementation in 
this type of framework. This is because many of the constraints, such as those 
dealing with connectivity of the solution, can be represented compactly using bit 
vectors, as described previously. Also, the one-to-one correspondence between 
variables and edges in the underlying graph yields a simple variable indexing 
scheme based on a lexicographic ordering of the edges. We therefore begin by 
describing the use of SYMPHONY to implement a basic solver for the Vehicle 
Routing Problem [75T741 . and then move on to describe a Set Partitioning solver 
m- Summary computational results will be given later in Sect. E] 



8.1 The Vehicle Routing Problem 

The Vehicle Routing Problem (VRP) was introduced by Dantzig and Ramser |d2] 
in 1959. In this graph-based problem, a central depot {0} uses k independent 
delivery vehicles, each of identical capacity C, to service integral demands di 
for a single commodity from customers i € N = {I,-- - ,n}. Delivery is to be 
accomplished at minimum total cost, with Cy denoting the transit cost from i 
to j, for 0 < i, j < n. The cost structure is assumed symmetric, i.e., = Cji 

and Cii = 0. 
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A solution for this problem consists of a partition {i?i, . . . , -Rfe} of N into 
k routes, each satisfying ^ corresponding permutation, or 

tour, ai, of each route specifying the service ordering. This problem is naturally 
associated with the complete undirected graph consisting of nodes A^U{0}, edges 
E, and edge-traversal costs Cij,{i,j} G E. A solution is a cloverleaf pattern 
whose k petals correspond to the routes serviced by the k vehicles. An integer 
programming formulation can be given as follows: 



min E CeXe 

eeE 

Xe = 2fc (2) 

e={0o'}GB 

^ Xe = 2 Wi G N (3) 

e={i,j}&E 

Xe> 2b{S) VS' C N, |S| > 1 (4) 

e={i,j}&E 

ieS,3^S 

0 < Xe<l Ve = {i,j} G E, z, j fy 0 (5) 

0 < Xe< 2 Ve = {0,j} e E (6) 

Xe integral Ve G E. (7) 

Here, b{S) can be any lower bound on the number of trucks needed to service the 
customers in set S, but for ease of computation, we define 6(S) = [ (X^ieS fy)/C] • 



The constraint m stipulates that there should be exactly k routes, while the 
constraints @ require that each customer be visited by exactly one vehicle. The 
constraints ([4]) ensure connectivity of the solution while also implicitly ensuring 
that no route services total demand in excess of the capacity C. 

Solver Implementation. Implementing a BCP algorithm based on the above 
formulation is straightforward using the framework. As discussed in Sect. 15.21 
our main concern is with the treatment of constraints and variables. The num- 
ber of variables is small enough for practical instances of this problem that we 
don’t need to concern ourselves with column generation. We tried using column 
generation but did not find it to be advantageous. Our indexing scheme for the 
variables is based on a lexicographic ordering of the edges in the complete graph. 
This enables an easily calculable one-to-one mapping of edges to indices. To con- 
struct the core of the problem, we select the variables corresponding to the k 
cheapest edges adjacent to each node in the graph, as these are the variables 
most likely to have a positive value in some optimal solution. The remainder 
of the variables are added as extra variables in the root node. The hope is that 
most of these will be priced out of the problem quickly. 

Constraints present different challenges, however. The number of constraints 
in the LP relaxation is exponential. Furthermore, the separation problem for the 
constraints (S) is known to be AfP-complete [H]. We therefore rely on heuristic 
separation routines to generate the constraints Q dynamically during the search 
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process. Because the number of degree constraints du and is small and we 
want them to appear in all relaxations (our heuristics depend on this fact), 
they are placed in the core. Initially, these are the only active constraints in 
the root node. In the basic solver presented here, dynamic constraint generation 
takes place only for the capacity constraints Q. Since the sets of variables with 
nonzero indices in these constraints correspond to the sets of edges across cuts 
in the graph, these constraints can be represented as a bit array indicating the 
nodes included in one shore of the cut (the one not containing the depot). To 
construct the row corresponding to a particular constraint, it suffices to check 
the edge corresponding to each active variable in the current relaxation and 
determine if its endpoints are on opposite shores of the cut. If so, the variable 
has a coefficient of one in the row. Otherwise, its coefficient is zero. 

Besides cut generation, only a few other problem-specific routines are needed 
to implement the basic solver. Strong branching candidates are selected using a 
built-in default rule — select those variables nearest to .5 in value. Some logical 
fixing can also be done. For instance, if two edges adjacent to a particular node 
are already fixed to one, then all other edges adjacent to that node can be fixed 
to zero. 



8.2 Set Partitioning Problem 

In jSS], Eso used an early version of SYMPHONY to implement a solver for the 
Set Partitioning Problem (SPP). Here, we review her work. Combinatorially, 
the Set Partitioning Problem can be stated as follows. We are given a ground 
set S of m objects and a collection C of subsets ^i, • • • , of S, each with a 
given cost Cj = c{Sj). We wish to select the minimum weight subfamily of C 
that forms a partition of S. This problem is well-studied and describes many 
important applications, including airline crew scheduling, vehicle routing, and 
political districting (see I41I11I511TI?] !. 

To describe an integer programming formulation of the SPP, we construct 
matrix A, whose rows correspond to the members of S and whose columns 
correspond to the members of C. Entry a^- is 1 if the element of S is included 
in subset Sj; otherwise, we set to zero. Then the problem can simply be stated 
as 



n 



min CjXj 

i=i 

n 




(8) 


s.t. aijXj = 1 , 


1 < Z < TO 


(9) 


Xj G {0, 1} , 


f < j < n. 


(10) 



Each row of this formulation expresses the constraint that there must be exactly 
one member of the partition containing each element of the set S. 
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Solver Implementation. In crew scheduling applications, the matrix A can 
be extremely large and generating it can be difficult. Furthermore, solving the 
LP relaxation itself can also be very difficult. For the work in m, the matrix 
A was assumed given. Even so, preprocessing A, in order to reduce its size 
without eliminating any optimal solutions, was found to be vitally important. 
Since finding feasible solutions to this problem within the search tree can prove 
quite difficult, heuristic solution procedures were used to find good upper bounds. 
Both the preprocessing of the matrix and the execution of heuristic procedures 
were performed before the branch and cut procedure was invoked. It is important 
to note that performing intensive computation prior to beginning the branch 
and cut procedure can potentially decrease parallel speedup by increasing the 
computation’s sequential fraction fsee 17.11 1. 

Unlike many combinatorial problems, for crew scheduling models, it was dif- 
ficult to judge a priori the relative importance of each column and its likelihood 
of participating in some optimal solution. This is because the magnitude of the 
objective function coefficient corresponding to a particular column is not neces- 
sarily a good indicator of its usefulness. Large objective function coefficients may 
simply correspond to columns representing large subsets of the set S. Because 
of this, the set of core variables was taken to be empty in order to allow removal 
by reduced cost and logical implications in the lower levels of the search tree. 
On the other hand, all constraints remaining after preprocessing were taken to 
be core constraints. 

Part of the difficulty inherent in crew scheduling models stems from the ex- 
tensive computation often required for solving the LP relaxations. In particular, 
the simplex algorithm sometimes has trouble optimizing these linear programs. 
Therefore, the barrier method with dual crossover was used to solve the ini- 
tial LP relaxation and derive a feasible basis for the dual simplex algorithm, 
which was then used for the remaining calculations. The same problem reduc- 
tion procedures that were so important during preprocessing were also employed 
throughout the tree to further reduce the matrix after branching or otherwise fix- 
ing variables. In addition, a primal heuristic was employed to derive new feasible 
solutions and hence improve the current upper bound. Candidates for branching 
were taken from among the variables, existing cuts that had become slack, and 
cuts produced specifically for branching. The algorithm employed detection of 
tailing off and forced branching whenever such a condition was detected. 

Many known classes of cutting planes can be used to strengthen the LP 
relaxations for this problem. Examples are clique inequalities, odd holes, packing 
and cover odd holes, odd antiholes, and other lifted versions of these classes. 
Because it was not clear which of these classes would produce the strongest cuts 
for a particular instance, the cut generator was itself parallelized in order to find 
cuts in several classes simultaneously. 

9 Computational Experience 

In this section, we describe our experience using the framework to solve the two 
classical combinatorial optimization problems we have already described (VRP 
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and SPP). Although we report some running times in Sect. (9^ below, what 
follows should be considered anecdotal evidence based on our observations and 
experience with the development of SYMPHONY. 



9.1 Sequential Performance 

Performance of our code when running sequentially has improved dramatically 
since the first published results in Eg. Although the same fundamental design 
has been maintained since the inception of the project, the implementation has 
been streamlined and improved to the point that the running time for a standard 
set of Vehicle Routing Problems, even after adjusting for increased processing 
speed, has improved by more than two orders of magnitude since 1995. This un- 
derscores the fact that, first and foremost, implementational details can produce 
a marked difference in efficiency for BCP algorithms. In the following sections, 
we summarize a few of the “details” that have proven to be important. However, 
we emphasize that the most effective way to learn about the implementation of 
these algorithms is to examine the documentation and source code itself m- 
It is well-known that the vast majority of the computing time for BCP is spent 
in two activities — solving LP relaxations and generating new cuts and variables. 
In SYMPHONY, both of these activities are performed by external libraries. 
The result is that, in practice, very little time is spent executing instructions 
that actually reside within the framework itself. Hence, improvements to running 
times must come not through reducing the time spent within the framework, but 
rather through reducing the time spent in code outside the framework. Although 
we have no control over the efficiency of these external codes, we can control not 
only the input to these external subroutines, but also the number of times they 
need to be called. To achieve real improvements in efficiency, one must guide the 
solution process with this in mind. 



Linear Program Management. To reduce time spent solving linear pro- 
grams, we emphasize once again that the most important concept is the ability 
to limit the size of the LP relaxations by allowing cuts and variables to be fluidly 
generated, activated, and deactivated at various points during the (iterative) so- 
lution process. This entire process must be managed in such a way as to reduce 
the size of the matrix while not reducing the quality of the solutions produced. 
It is also critical to maintain LP warm-start information (i.e., a description of 
the current basis) throughout the tree to allow efficient processing of each search 
tree node. 



Constraints. The most effective approach to managing the constraints in the 
LP relaxation has been to be conservative with adding constraints to the relax- 
ation while being liberal with removing them. We have found that by deleting 
ineffective constraints quickly, we can significantly reduce LP solution time. Of 
course, it would be better not to add these ineffective constraints in the first 
place. The local cut pools, which allow only the “best” constraints to be added 
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in each iteration, have been instrumental in reducing the number of cuts that 
eventually do become ineffective. This combination approach has worked ex- 
tremely well. 

With respect to reducing the time spent generating constraints, the global 
cut pool is effective for constraints whose generation is relatively expensive. For 
constraints that are inexpensive to generate in comparison to the time spent 
solving LPs, the cost of maintaining the cut pool does not always pay off. Our 
approach to management of the cut pool has been similar to that of managing 
the constraints of the linear programs, but here it is less clear how to effectively 
keep its size under control. Our conservative policy with respect to adding con- 
straints to the pool has produced good results. However, the question of how to 
determine which constraints should be removed from the cut pool needs further 
consideration. 



Variables. Although cuts and variables can be handled symmetrically in many 
respects, there are some major differences. While generating cuts helps tighten 
the formulation and increase the lower bound, generating variables has the op- 
posite effect. Therefore, one must be somewhat careful about when variable gen- 
eration is invoked, as it destroys monotonicity of the objective function, upon 
which algorithmic performance sometimes depends. Furthermore, before a node 
can be properly fathomed in BCP, it is necessary to ensure that there are no 
columns whose addition to the problem could eliminate the conditions necessary 
to fathom the node in question, i.e., by either lowering the objective function 
value back below the current upper bound or by restoring feasibility. Thus, the 
user must be mindful of whether the node is about to be fathomed before per- 
forming column generation. 

In many problem settings, particularly those involving combinatorial opti- 
mization, it is much easier to judge a priori the importance of a particular vari- 
able (based on the problem structure and the structure of the objective function) 
than it is to judge the importance of a constraint. It is important to take advan- 
tage of this information. We have mentioned two different ways in which we can 
do this. By declaring some “unimportant” variables inactive in the root node, 
the user can delay including them in any LP relaxation until the column gener- 
ation step. In the two-phase method with repricing in the root node (see Sect. 
E3D, it is possible, even probable, that these variables would simply be priced 
out immediately in the second phase. In theory, this should allow much faster 
processing of subproblems and less time spent solving LPs. 

In our experience with combinatorial problem solving, however, generating 
columns, unless done very efficiently, can be an expensive operation whose cost 
is not usually justified. This is especially true in the presence of a good initial 
upper bound. In this case, most “unimportant” columns end up being priced out 
either in the root node or relatively soon thereafter. Therefore, if efficient explicit 
generation of all variables is possible in the root node and there is sufficient 
memory to store them, this is generally the best option. This allows variables to 
be fixed by reduced cost and nodes to be fathomed without expensive pricing. 
However, if either (1) there is not enough memory to store all of the problem’s 
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columns at once, (2) it is expensive to generate the variables, or (3) there is 
an efficient method of pricing large subsets of variables at once, then column 
generation might be an effective technique. 



Search Tree Management. The primary way in which solution time can be 
reduced is by reducing the size of the search tree. Effective branching, on both 
variables and constraints, is an important tool for this reduction. One of the 
most effective methods available for branching is strong branching, discussed in 
Sect. Iti.2l In a recent test on a set of Vehicle Routing Problems, it was found 
that the number of search nodes explored was reduced (in comparison with stan- 
dard branching) by over 90% using strong branching with just seven candidate 
branching objects. For this reason, use of strong branching is highly recom- 
mended. Nonetheless, this branching should be delayed as long as possible. As 
long as violated cuts can be found and the relaxation solution value has not 
tailed off, processing of the current search tree node should continue. 

Another way to affect the size of the search tree is through an effective search 
strategy, as we discussed in 16.31 We have found that a hybrid of “best- first, ” along 
with controlled diving produces the best results. Diving leads to improved upper 
bounds and reductions in node set-up costs while minimizing communication 
with the tree manager. This can be significant when employing parallelism. 



9.2 Parallel Performance 

Given the modular design of SYMPHONY, the transition from sequential to 
parallel processing is straightforward. However, the centralized management of 
the search process and centralized storage of the search tree, while highly effective 
for sequential processing, does not lead to a scalable parallel algorithm. That 
said, the parallel efficiency of our approach is very high for small numbers of 
processors. For many practical settings, this is all that is needed. Furthermore, 
as discussed earlier, this parallelism is achieved without sacrificing any of the 
efficiencies to be gained in the more typical sequential setting. 

Parallel efficiency in BCP is achieved mainly through effective qualitative 
and quantitative load balancing. For small numbers of processors, our approach 
handles load balancing with ease. The search trees produced in parallel runs are 
approximately the same size as those produced sequentially, leading to linear 
speedup. See the sample computational results in the next section for an example 
of this. 

As the number of processors increases, the tree manager eventually becomes 
a computational bottleneck. As indicated in Sect. 17.11 the point at which this 
happens is highly problem-dependent. When this point is reached, parallel ef- 
ficiency can be increased by limiting communication between the tree manager 
and the LP solver as much as possible. For instance, increased diving can signif- 
icantly enhance large-scale parallel efficiency. For more ideas on improving the 
scalability of these algorithms, see the discussion in Sect. lit). 11 
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9.3 Computational Results for Vehicle Routing 

Our experience with the VRP has reinforced many of the lessons already dis- 
cussed. We experimented with column generation, but found it to be more effec- 
tive to simply include all variables in the root node. However, placing only the 
k shortest edges adjacent to each customer node in the problem core did lead to 
significant gains. We found strong branching to be an extremely effective tool 
for reducing the size of the search tree. However, we experienced diminishing 
returns when examining more than 7-10 candidates. We used a hybrid diving 
strategy that allowed us to find feasible solutions early in the search process. In 
many cases, this led to running times almost equaling those achieved by com- 
puting a heuristic upper bound prior to beginning the branch and cut phase of 
the algorithm. Although effective constraint generation is absolutely critical for 
this problem, use of the cut pool did not have a dramatic effect. This is probably 
because the cuts we applied are relatively easy to generate. 

Table [I] presents summary computational results for recent testing of the 
basic VRP solver discussed earlier. These results are not intended to be compre- 
hensive, but are presented for illustrative purposes. For more detailed discussion 
of using SYMPHONY to solve the VRP and more in-depth computational re- 
sults, see m- Tests here were performed on a network of 3 workstations powered 
by 4 DEC Alpha processors each using CPLEX as the LP engine. The problems 
are easy- to medium-difficulty problems from VRPLIB [81] and other sources 

m- 

From Table [Tj it is evident that the number of nodes in the search tree is 
largely independent of the number of LP processes being utilized. This essen- 
tially ensures linear speedup as long as parallel overhead remains low and there 
are no computational bottlenecks. Predictably, as the number of processors in- 
creases, the idle time also increases, indicating that the tree manager is becoming 
saturated with requests for data. Because not many strong cuts are known for 
this model, we tend to rely on quick enumeration to solve these problems. This 
is possible because the LP relaxations are relatively easy to solve. It is therefore 
common to develop large search trees in a short period of time. Our compact 
storage scheme allows us to deal with these large search trees. However, scala- 
bility suffers in this situation. 



Table 1. Summary Computational Results for VRP instances 





Number of LP 


processes used 




1 


2 


4 


8 


Number of search tree nodes 


6593 


6691 


6504 


6423 


Wallclock solution time (sec) 


2493 


1281 


666 


404 


Wallclock solution time per node 


0.38 


0.38 


0.41 


0.50 


Idle time per node 


0.00 


0.01 


0.03 


0.08 
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9.4 Computational Results for Set Partitioning 

In contrast to the Vehicle Routing Problem, the Set Partitioning Problem pro- 
duces relatively small search trees but search nodes can be much more difficult to 
process. In particular, the simplex algorithm had difficulty in solving some of the 
LP relaxations encountered in this problem. However, as mentioned earlier, Eso 
does report success with using the barrier method with dual crossover to solve 
the initial LP relaxations. Problem size reduction techniques help to control the 
size of the LP relaxations, leading to reduced node processing times. 

In this problem, tailing off can be a problem, so branching was invoked when- 
ever the lower bound did not show significant improvement over a sequence of 
iterations. As in the VRP, strong branching was found to be an important tool 
in reducing the size of the search tree. However, for the SPP, choosing specially 
constructed cuts, in addition to variables, as branching candidates was found 
to be important. Dynamic cut generation was also critical to efficiency. Exten- 
sive computational results on a variety of problems (assembled from the crew 
scheduling literature) are reported in [IdtiJ . The implementation solved many of 
these problems to optimality in the root node of the enumeration tree, so there 
was no need in such instances for parallel processing. Of the remaining prob- 
lems, some proved too difficult to solve, mainly due to difficulties in solving LP 
relaxations, as indicated above. Several of the more difficult models did yield 
solutions, however, with significant multi-processing effort. The principles ob- 
served here were similar to those for the VRP: The number of search tree nodes 
was essentially independent of the number of LP processes, resulting in linear 
speed-up for as many as 8 LP processes. For example, with model aa04, a well- 
known difficult problem taken from |^, the following results shown in Table |2] 
were obtained (see also Table 5.12 of m)- The computing platform is an IBM 
SP2 with LP solver CPLEX. 



Table 2. Sample computational results for the crew scheduling model aa04 





Number of LP 


processes used 




1 


2 


4 


8 


Number of search tree nodes 


283 


268 


188 


234 


Depth of search tree 


25 


22 


16 


17 


Wallclock solution time (sec) 


2405 


nil 


350 


240 



10 Future Development 

Although the theory underlying BCP algorithms is well-developed, our knowl- 
edge of how to implement these algorithms continues to improve and grow. To 
some extent, effective implementation of these algorithms will continue to depend 
on problem-specific techniques, especially cut generation. However, we have al- 
ready learned a great deal about how to remove certain burdens from the user by 



Branch, Cut, and Price: Sequential and Parallel 255 



implementing generic defaults that work well across a wide variety of problems. 
In this section, we offer a few ideas about where future growth will occur. 



10.1 Improving Parallel Scalability 

With the state of technology driving an increasing interest in parallel compu- 
tation, it is likely that parallel algorithms will continue to play an important 
role in the field of optimization. In these notes, we have touched on some of 
the central issues surrounding the parallelization of BCP algorithms, but much 
remains to be learned. In particular, more scalable approaches to BCP need to 
be developed. As we have already pointed out, this clearly involves some degree 
of decentralization. However, the schemes that have appeared in the literature 
(mostly applied to parallel branch and bound) appear inadequate for the more 
complex challenges of BCP. 

The most straightforward approach to improving scalability is simply to in- 
crease the task granularity and thereby reduce the number of decisions the tree 
manager has to make, as well as the amount of data it has to send and receive. 
To achieve this, we could simply allow each LP process to examine an entire 
subtree or portion of a subtree before checking back for additional work. This 
approach would be relatively easy to implement, but has some potentially se- 
rious drawbacks. The most serious of these is that the subtree being examined 
could easily turn out to contain mostly unpromising nodes that would not have 
been examined otherwise. Hence, this scheme seems unlikely to produce positive 
results in its most naive form. 

Another approach is to attempt to relieve the bottleneck at the central tree 
manager by only storing the information required to make good load-balancing 
decisions (most notably, the lower bound in each search tree node) centrally. 
The data necessary to generate each search node could be stored either at one 
of a set of “subsidiary tree managers” or within the LP modules themselves. 
This is similar to a scheme implemented by Eckstein m for parallel branch 
and bound. Such a scheme would maintain the advantages of global decision- 
making while moving some of the computational burden from the tree manager 
to other processes. However, effective load balancing would necessarily involve 
an expensive increase in the amount of data being shuttled between processes. 
Furthermore, the differencing scheme we use for storing the search tree will not 
extend easily to a decentralized environment. 



10.2 Other Directions 

The vast majority of research on BCP has concentrated on the now well-studied 
technique of branch and cut. Branch and price, on the other hand, has received 
relatively little attention and the integration of these two methods even less. 
In particular, issues related to if, when, and how to effectively generate new 
variables, independent of the problem setting, need further investigation. Ef- 
fective management of pools for both cuts and variables is another important 
computational issue which deserves attention. As we pointed out several times. 
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SYMPHONY currently has a bias towards the implementation of branch and cut 
algorithms. We intend to improve and generalize our implementation of variable 
generation in order to make the framework more flexible and efficient for branch 
and price. 

Outside of branching on fractional variables, few generic branching rules have 
been developed. Most BCP implementations still rely on variable branching be- 
cause it is easy to implement and relatively effective. However, there are situa- 
tions in which it can be ineffective when compared to branching on a well-selected 
cut or on a set of objects. Automatic methods of determining which cuts will 
make effective branching objects have yet to be examined. 

Until recently, almost all BCP algorithms have utilized simplex-based LP 
solvers to perform lower bounding. Currently, these solvers still offer the best 
performance across a wide range of problems. However, new solution techniques, 
such as the volume algorithm (see m) are showing promise in helping to solve 
those problems on which the simplex algorithm falters. As discussed in 18.21 
we have already seen in | 36| that the barrier method successfully provided an 
alternative to the simplex method in solving large-scale LP problems arising in 
airline crew scheduling models. Relaxation schemes that use techniques other 
than linear programming, e.g., semi-definite programming, are also coming into 
prominence. Interfacing with these new solvers should provide fruitful avenues 
for further improvement in BCP methodology. 



11 Conclusion 

In these notes, we have given the reader a summary of many important chal- 
lenges of implementing branch, cut, and price algorithms. However, there are 
many more details to be explored below the surface. We encourage the inter- 
ested reader to visit http : / /BranchAndCut . org for more extensive documenta- 
tion and source code for SYMPHONY. 
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Abstract. The first computer implementation of the Dantzig-Fulkerson- 
Johnson cutting-plane method for solving the traveling salesman prob- 
lem, written by Martin, used subtour inequalities as well as cutting planes 
of Gomory’s type. The practice of looking for and using cuts that match 
prescribed templates in conjunction with Gomory cuts was continued in 
computer codes of Miliotis, Land, and Fleischmann. Grotschel, Padberg, 
and Hong advocated a different policy, where the template paradigm is 
the only source of cuts; furthermore, they argued for drawing the tem- 
plates exclusively from the set of linear inequalities that induce facets of 
the TSP polytope. These policies were adopted in the work of Growder 
and Padberg, in the work of Grotschel and Holland, and in the work of 
Padberg and Rinaldi; their computer codes produced the most impressive 
computational TSP successes of the nineteen eighties. Eventually, the 
template paradigm became the standard frame of reference for cutting 
planes in the TSP. The purpose of this paper is to describe a technique 
for finding cuts that disdains all understanding of the TSP polytope and 
bashes on regardless of all prescribed templates. Gombining this tech- 
nique with the traditional template approach was a crucial step in our 
solutions of a 13,509-city TSP instance and a 15,112-city TSP instance. 



1 The Cutting-Plane Method and Its Descendants 

The groundbreaking work of Dantzig, Fulkerson, and Johnson | 19| on the trav- 
eling salesman problem introduced the cutting-plane method, which can be used 
to attack any problem 

minimize (F x subject to a; G 5, (1) 

where 5 is a finite subset of some Euclidean space K™, provided that an efficient 
algorithm to recognize points of S is available. This method is iterative; each 
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of its iterations begins with a linear programming relaxation of o, meaning a 
problem 

minimize c^a; subject to Ax < b, (2) 

where the polyhedron P defined as {x : Ax < b} contains S and is bounded. 
Since P is bounded, we can find an optimal solution x* of @ which is an 
extreme point of P. If x* belongs to S, then it constitutes an optimal solution of 
m-, otherwise, some linear inequality separates x* from S in the sense of being 
satisfied by all the points in S and violated by x*] such an inequality is called 
a cutting plane or simply a cut. In the latter case, we find a family of cuts, add 
them to the system Ax < b, and use the resulting tighter relaxation of (jT} in the 
next iteration of the cutting-plane method. 

Each iteration of the method requires first finding x* and then finding a 
family of cuts. Finding x* presents no problem: this is what the simplex method 
and other LP algorithms are for. Finding cuts is the challenge that has to be 
answered with each new application of the cutting-plane method; we shall return 
to this challenge later. 

Progress of the cutting-plane method towards solving a particular instance 
of problem m is often estimated by the increase in the optimal value of its LP 
relaxation; as more and more cuts are added, these increases tend to get smaller 
and smaller. When they become unbearably small, the sensible thing to do may 
be to branch: having chosen a vector a and numbers f3' , (3" with (3' < (3" such 
that 



X* S {(3',/3”) and {a^x : a; S 5} C (— oo,/3'] U [/3",-|-oo), 
we solve the two subproblems, 

minimize c^x subject to a: S 5, a^x < (3' 



and 



minimize x subject to x € S, x > f3” , 



separately. (If all the elements of S are integer vectors and some component cc* 
of X* is not an integer, then we may choose a so that x is identically equal 
to Xe and set f3' = \_x%\, f3" = \x*'\.) At some later time, one or both of these 
two subproblems may be split into sub-subproblems, and so on; in the resulting 
binary tree of subproblems, each node has the form 



minimize (P^x subject to a £ 5, Cx < d 



(3) 



for some system Cx < d of linear inequalities and each leaf will have been either 
solved without recourse to branching or else found irrelevant since the optimal 
value of its LP relaxation turned out to be at least as large as P'x for some 
previously known element a: of 5. This scheme is one of the many variants of the 
branch- and-bound method. (The term “branch-and-bound” , coined by Little, 
Murty, Sweeney, and Karel [H], refers to a general class of algorithms that 
originated in the work of Bock [ 7 ] , Croes m, Eastman m, Rossman and Twery 
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| |65j . and Land and Doig [42j : in this more general context, relaxations of o may 
come from a universe far wider than that of linear programming relaxations © 
and each subproblem may be split into more than two sub-subproblems.) 

Computer codes written by Hong |39], Miliotis |48], and Grotschel, Jiinger, 
and Reinelt | 33| introduced a particular variant of this variant, where each sub- 
problem is attacked by the cutting-plane method; in these codes, the cuts intro- 
duced in solving ([3|) are satisfied by all points of S (rather than merely by all 
points X of S which satisfy Cx < d), and so they can be passed to any other 
subproblem later on. Padberg and Rinaldi PD] termed this approach branch- 
and-cut. 



2 Ways of Finding Cuts 



The symmetric traveling salesman problem, or TSP for short, is this: given a 
finite number of “cities” along with the cost of travel between each pair of them, 
find the cheapest way of visiting all of the cities and returning to your starting 
point. The travel costs are symmetric in the sense that traveling from city X 
to city Y costs just as much as traveling from Y to X; the “way of visiting all 
the cities” is simply the order in which the cities are visited. This problem is a 
special case of 0 with m = n{n — 1) /2, where n is the number of the cities and 
S consists of the set of incidence vectors of all the hamiltonian cycles through 
the set V of the n cities; in this context, hamiltonian cycles are commonly called 
tours. Dantzig, Fulkerson, and Johnson illustrated the power of their cutting- 
plane method by solving a 49-city instance of the TSP, an impressive size at the 
time. They let the initial polyhedron P consist of all vectors x, with components 
subscripted by edges of the complete graph on V, that satisfy 

0 < a;e < 1 for all edges e (4) 



and 



'■ V G e) = 2 for all cities v. 



(5) 



(Throughout this paper, we treat the edges of a graph as two-element subsets 
of its vertex-set: v G e means that vertex v is an endpoint of edge e; e fl Q yf 0 
means that edge e has an endpoint in set Q; e — Q yf 0 means that edge e 
has an endpoint outside set Q; and so on.) All but two of their cuts have the 
form :enQyf0,e — Qyf0)>2, where Q is a nonempty proper subset 

of V ; they are satisfied by all tours through V because every such tour has to 
move from Q to V — Q at least once and it has to move back to Q after each 
such crossing. Dantzig, Fulkerson, and Johnson called such inequalities “loop 
constraints”; nowadays, they are commonly referred to as “subtour elimination 
inequalities”; we are going to call them simply subtour inequalities. (As for the 
two exceptional cuts, Dantzig, Fulkerson, and Johnson give ad hoc combinatorial 
arguments to show that these inequalities are satisfied by incidence vectors of 
all tours through the 49 cities and, in a footnote, they say “We are indebted to 
I. Glicksberg of Rand for pointing out relations of this kind to us”.) 
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An important class of problems 0 are the integer linear programming prob- 
lems, where S is specified as the set of all integer solutions of some explicitly 
recorded system of linear constraints. For this class, Gomory I26I27I28I designed 
fast procedures for generating cuts from the optimal simplex basis (and proved 
that systematic use of these cuts makes the cutting-plane method terminate); 
cuts generated by these procedures are called Gomory cuts. 

If an LP relaxation of a TSP instance includes all constraints @, ©, then 
a nonempty set of cuts can be found routinely whenever x* ^ S: on the one 
hand, if x* is not an integer vector, then it violates a Gomory cut; on the other 
hand, if x* is an integer vector, then it is the incidence vector of the edge-set 
of a disconnected graph and each connected component of this graph yields a 
subtour cut. The first computer code for solving the TSP by the cutting-plane 
method, written by Martin m, adopts this policy: some of its cuts are subtour 
inequalities and others are generated by a variation on Gomory’s theme described 
in Martin |35]. In subsequent TSP codes, subtour inequalities became a stock 
item, but Gomory cuts fell into disuse when a different paradigm for finding cuts 
took over. 

By a template, we mean a set of linear inequalities; we say that a cut matches 
the template if it belongs to the set. By the template paradigm, we mean the 
following two-part procedure used in the design of branch-and-cut algorithms: 

(i) describe one or more templates of linear inequalities that are satisfied by all 
the points of S, 

(ii) for each template described in part (i), design an efficient separation algo- 
rithm that, given an x*, attempts to find a cut that matches the template. 

The separation algorithms in (ii) may be exact in the sense of finding a cut that 
separates x* from S and matches the template whenever one exists and they 
may be heuristic in the sense of sometimes failing to find such a cut even though 
one exists. 

The primordial template of TSP cuts is the set of subtour inequalities; an 
exact separation algorithm for this template has been pointed out by Hong [S^ . 
Next came the template of “blossom inequalities”, introduced by Edmonds [21] 
in the context of 2-matchings and used in the branch-and-cut TSP computer code 
written by Hong | 39| : then came the more general template of “comb inequali- 
ties” , first used by Grotschel in his solution of a 120-city TSP instance by 

the cutting-plane method. To describe these templates, let us define, for every 
vector X with components subscripted by edges of the complete graph on V and 
for every pair A, B of disjoint subsets of V , 

x{A, B) = ^(xe : e n A yf 0, e n i? yf 0). 

In this notation, subtour inequalities are recorded as x{Q, V — Q) > 2; a, comb 
inequality is any inequality 

k 

x{H, V-H) + Y^ x{Ti, V-T,)>3k + 1, 

2=1 



( 6 ) 



TSP Cuts Which Do Not Conform to the Template Paradigm 265 



where 

• H,Ti,T 2 , . . .Tk are subsets of V, 

• Ti, T 2 , . . . Tfc are pairwise disjoint, 

• T* n i? ^ 0 and Ti - iJ 0 for alH = 1, 2, . . . k, 

• k is odd. To see that every comb inequality is satisfied by all characteristic 

vectors x of tours through V, let j denote the number of sets Ti that satisfy 
x{Ti, V — Ti) = 2. In this notation, x{H, V — H) > j; furthermore, if j = k, then 
x{H, V—H) > k+1 as x{H, V—H) is even and k is odd; since each x{Ti,V—Ti) is 
a positive even integer, we have V—Ti) > 2j+4(fc— j) and ([SI) follows. 

A blossom inequality is a comb inequality with \Ti\ = 2 for alH = 1, 2, . . . k. 

Just like the subtour inequalities, the blossom inequalities, and the comb in- 
equalities, more complicated templates of linear inequalities that are satisfied by 
all characteristic vectors x of tours through V are often described as hypergraph 
inequalities 

Y.{^qx{Q,V -Q) ■.Qen)>p 

where % is & collection of subsets of V (also known as a hypergraph on V) and 

{Q S 'H) and p are integers. Naddef reviews a number of templates of 
hypergraph inequalities satisfied by all characteristic vectors x of tours through 

F. 

TSP codes of Miliotis |4^, Land [d^, and Fleischmann |2^ used the template 
paradigm as their preferred source of cuts; whenever this source dried up, they 
provisionally switched to Gomory cuts. Grotschel and Padberg |35I36| and Pad- 
berg and Hong advocated a different policy, where the template paradigm 
is the only source of cuts; furthermore, they argued for drawing the templates 
exclusively from the set of linear inequalities that induce facets of the traveling 
salesman polytope^ meaning the convex hull of S. These policies were adopted in 
TSP codes of Growder and Padberg m, Grotschel and Holland and Pad- 
berg and Rinaldi mm, which produced the most impressive computational 
TSP successes of the nineteen eighties. The template paradigm is also the frame 
of reference for other papers related to TSP codes, such as Garr [13], Ghristof 
and Reinelt m, Glochard and Naddef m, Gornuejols, Fonlupt, and Naddef 
in. Fleischer and Tardos m, Grotschel and Pulleyblank m, Letchford m, 
Naddef and Thienel PEg, Padberg and Rao PI. and Padberg and Rinaldi m 

E2- 

We have written a branch-and-cut computer code for solving the TSP. Its 
initial version was written in 1994; we presented some of its aspects at the 15th 
International Symposium on Mathematical Programming held at the Univer- 
sity of Michigan in 1994, described them in Applegate et al. [T], and eventually 
distributed the code as Goncorde 97.08.27 at the 16th ISMP held at the Ecole 
polytechnique federate de Lausanne in 1997. A later version was written in 1997; 
we presented some of its aspects at the 16th ISMP, outlined them in Applegate 
et al. |2], and eventually made the code available on the internet, as Goncorde 
99.12.15, at Applegate et al. [^. In the initial 1994 version, cuts were found 
partly by following the template paradigm (with a couple of our own separation 
heuristics for comb inequalities thrown in) and partly by a couple of new tech- 
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niques with the common theme of innovating obsolete cuts. In the later 1997 
version, we incorporated a technique that disdains all understanding of the trav- 
eling salesman polytope and bashes on regardless of all prescribed templates. 
This departure from the template paradigm is the subject of the present paper. 

3 Cuts, Tours, and Shrinking 

Let X* denote the optimal solution of the current LP relaxation that has been 
returned by the LP solver. In computer codes that search for TSP cuts, it is 
common practice - Land m , Padberg and Hong [SS| , Crowder and Padberg m. 
Padberg and Rinaldi ISM, Grotschel and Holland 132!, Naddef and Thienel 
PM - to first reduce the size of the problem by shrinking pairwise disjoint 
subsets of the set V of all cities into singletons and then to look for cuts in the 
shrunk version of x*. 

Shrinking is an intuitive concept; to illustrate it on a toy-size example, let 
us begin with V = {0, 1, 2, . . . , 9}. Shrinking each of the sets {0,6,7}, {1,8}, 
and {2,9} into a single vertex - {0,6,7} into 0, {1,8} into 1, {2,9} into 2 - 
reduces each vector x with components a;oi, a^ 02 , ■ • ■ , a :89 (edges are two-point 
sets, but we prefer writing Xij to writing x^ijy) to the vector x with components 
^ 01 , ^ 02 , ■ • ■ , T 45 defined by 

2^01 = 2^01 + 2 ; q 8 + 2^16 + 2^17 + 2^68 + 2 ^ 78 , 

2^02 = 2:02 + 2^09 + 2:26 + X 27 + 2;69 + 2^79, 

^03 = 2:03 + 2^36 + 2 : 37 , 

2^04 = 2; q 4 + 2^46 -|- 2:47, 

2^05 = 2:95 + 2^56 + 2^57, 

2^12 = 2:12 + xig + X28 + 2 ^ 89 , 

2^13 = 2:13 -I- X 38 , 

Xi4 = a:i4 -|- X48, 

2^15 = 2^15 + 2J58, 

2^23 = 2:23 + 2^39, 

2^24 = 2:24 + 2^49, 

2^25 = 2:25 + 2^59, 

2^34 = 2 : 34 , 

^35 = 2 : 35 , 

X 45 — 3^45 ■ 

In particular, if x* is defined by 

2-01 = 0-3, Xq2 = 0.3, a^Qg = 0.7, Xq-^ = 0.7, a ;{4 = 0.5, 

2^25 “ 2^27 = 0.2, a;29 = 1-0, 0:34 = 0.5, = 0.5, 

x^g = 0.5, a :|8 = 0.5, x^g = 0.5, a;g 7 = 0.6, 



x{g = 0.2, a;{8 = 1.0, 

2-36 “ 2;|7 = 0.5, 

2^89 = 0 - 3 , 
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and x*j = 0 for all the remaining i and j such that 0 < i < j < 9, then its 
shrunk version x* is defined by 



= 0.5, x *02 = 0.5, Tt2 = 0.5, 

^03 “ ^14 “ ^25 “ 

^34 = 0.5, 3:35 = 0.5, 3:45 = 0.5, 



and x*j = 0 for all the remaining i and j such that 0 < i < j < 5. 



Looking for cuts in the shrunk version x* of x* does not mean looking for 
linear inequalities satisfied by all tours through the shrunk version V of V and 
violated by x*. In our toy example, the inequality 



a^oi + Xq2 + X12 + 3^03 + 3^14 + 3^25 ^ 4 

is satisfied by all tours through V - which is {0,1,2,.. .,5} - and violated by 
X*; through substitution from the definitions of Xij, it yields the inequality 

a:oi + 3 ;o 8 + a:i6 + a;i7 + 3;e8 + xrs 
+3;o2 + 3 ;o9 + X26 + X27 + 3;69 + Xrg 
+Xi2 + 3 Ji9 + 3;28 + Xs9 
+3^03 + 3^36 + 3^37 
+3)14 + 3)48 

+3)25 + 3)59 < 4, 

which is not satisfied by the tour 0-2-5-9-1-4-8-7-6-3-0. 

In this example, shrinking {0, 6, 7} into 0, shrinking {1,8} into 1, and shrink- 
ing {2,9} into 2 reduces the tour 0-2-5-9-1-4-8-7-6-3-0 to the spanning closed 
walk 0-2-5-2-1-+1-0-3-0; it reduces the incidence vector x of the tour to the 
vector X such that 



Xoi = 1, 3)02 = 1, 3)03 = 2, Xi2 = 1, Xu = 2, X25 = 2, 

and Xij = 0 for all the remaining i and j such that 0 < z < j < 5. In general, 
shrinking V into V reduces a tour through V to a, spanning closed walk through 
V ; it reduces the incidence vector x of the tour to a vector x such that 

• each Xe is a nonnegative integer, 

• the graph with vertex-set V and edge-set {e :Xe > 0} is connected, 

• G e) is even whenever v G V', 

we will refer to the set of all the vectors x with these properties as tangled 
tours through +. This notion, but not the term, was introduced by Cornuejols, 
Fonlupt, and Naddef m-, they refer to the convex hull of the set of all tangled 
tours through a prescribed set as the graphical traveling salesman polytope. Every 
inequality '^dijXij < h yields, through substitution from the definitions of Xij, 
an inequality ^ QijXij < b; if the original inequality is satisfied by all tangled 
tours through V, then the new inequality is satisfied by all tours through V ; if 
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the original inequality is violated by the shrunk version x* of x*, then the new 
inequality is violated by x*. In our toy example, the blossom inequality 

ai({0,l,2},{3,4,5}) 

+ai({0,3},{l,2,4,5}) 

+ai({l,4},{0,2,3,5}) 

+ir({2,5},{0,l,3,4}) > 10 

is satisfied by all tangled tours through V and violated by x*; it yields, through 
substitution from the definitions of Xij, the comb inequality 



x({0,l,2,6,7,8,9},{3,4,5}) 

+x({0,3,6,7},{!,2,4,5,8,9}) 

+x({l,4,8},{0,2,3,5,6,7,9}) 

+x({2,5,9},{0,l,3,4,6,7,8})>10, 

which is satisfied by all tours through V and violated by x*. 

As this example suggests, the change of variable from x to a; is particularly 
easy to implement in hypergraph inequalities: substitution from the definitions of 
Xij converts each linear function x{Q, V — Q) to the linear function x(Q, V — Q) 
where Q is the set of all cities that are mapped into Q by the function that 
shrinks V onto V. 

The subject of the present paper is a technique where, rather than first 
shrinking x* once and for all onto a V that may be large and then attempting 
to find many cuts in the single x* , we shrink x* many times onto sets V that 
are small (their typical size is at most thirty or so) and we find precisely one cut 
in each x* that lies outside the graphical traveling salesman polytope on V : see 
Algorithm [31] 

The forthcoming Sect. |4] takes up about half of the paper and describes 
the collection of commonplace techniques that we have chosen to implement 
the body of the for loop in Algorithm ED in Concorde, our computer code (a 
more detailed description will be presented in Applegate et al. 0); the very 
short Sect. Eldescribes Concorde’s implementation of the control of the for loop. 
Section El attempts to give an idea of how useful the inclusion of Algorithm EH in 
Concorde turned out to be. Section |3 places Algorithm ETl in a broader context 
and points out the potential for more general uses of the machinery assembled 
in Sect. 2] 

Algorithm 31. A scheme for collecting TSP cuts 

initialize an empty list C of cuts; 
for selected small integers k and 

partitions of V into nonempty sets Vq, ki, . . . , 14 
do X* = the vector obtained from x* by shrinking each Vi into singleton i; 

F={0,l,...,fc}; 
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if X* lies outside the graphical traveling salesman polytope on V 
then find a hypergraph inequality that is 

satisfied by all tangled tours through V and violated by x* 
change its variable from T to re, and 
add the resulting hypergraph inequality to £; 

end 

end 

return £; 

4 Processing x* 

We will illustrate Concorde’s implementation of the body of the for loop in 
Algorithm |2]on the example where V = {0,1,... ,7} and 



and x*j = 0 for all the remaining i and j such that 0 < i < j < 7. 

4.1 Does X* Lie Outside the Graphical Traveling Salesman 
Polytope? 

Rephrasing the Question. In our example, every tangled tour x through V 
satisfies the fourteen inequalities 



xoi + xi2 + rri3 + rri4 + rci5 + xiq + xn > 2, 

X 02 + Xl 2 + X 23 + X 24 : + ^25 + ^26 + ^27 > 2, 

2^03 + 2Ti 3 + rr23 + 2^34 + 2:35 + 2:36 + X37 > 2, 

2^04 + rri4 + rr24 + 2T34 + rE45 2-46 2T47 > 2, 

2^05 + 2^15 + 2^25 + 2^35 + 2^45 + 2^56 + 2^57 > 2 , 

2^06 + 2^16 + 2^26 + 2^36 + 2 T 46 + X^q + Xqj > 2 , 

^07 + ^17 + X 27 + X 37 + T 47 + T 57 + X67 > 2 



xl, = 0.7, = 0.1, xL = 0.9, x ;’;7 = 0.9, xlo = 0.7, 

x{g = 0.6, xJ 3 = 0.8, x ^5 = 0.4, x ^4 = 1.0, x^g = 0.1, 

x5e = 0.1, xl 7 = 0.1, x ^6 = 0.9, x ^7 = 1.0, 



S03 > 0 , Xo5 > 0 , Xo 6 > 0 , Xi 3 > 0 , X 14 > 0 , 

X 16 > 0, Xi 7 > 0, X 24 > 0, X 26 > 0, X 27 > 0, 

X 37 > 0, X 45 > 0, X 46 > 0, X 57 > 0 



and the seven inequalities 



and the two inequalities 



2T03 

Xo4 

Xq6 

^07 



+ 2 T 13 + X23 + X35 + X36 + X37+ 

+ Xi4 + X24 + X45 + X46 + X47 > 2 , 

+ 2T16 + X 26 + 2:36 + X46 + X 56 + 

+ Xi 7 + X27 + S37 + S47 + ^57 > 2 ; 
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since x* satisfies each of these twenty-three inequalities as an equation, it is a 
convex combination of tangled tours through V if only if it is a convex combi- 
nation of those tangled tours through V that satisfy each of the twenty-three 
inequalities as an equation. 

In general, every tangled tour x through V satisfies the inequalities 

Xe > 0 for all e, 

^(xe '■ u G e) >2 for all u, 
x{e, — e} > 2 for all e, 

and so x* is a convex combination of tangled tours through V if only if it is a 



convex combination of tangled tours through V that satisfy 

Xe = 0 for all e such that x* = 0, (7) 

^(xe : u G e) = 2 for all u such that ■ u G e) = 2, (8) 

x(e, V — e) = 2 for all e such that x*(e, V — e} = 2. (9) 

Concorde chooses Vb, Vi, . . . , 14 so that x(fy, V — Vi) = 2 for all i = 1, 2, . . . , fc 

(but not necessarily for i = 0), and so 

^(x* : i S e) = 2 for all f in {1, 2, , k}. 

(Our machinery for processing x* is predicated on this property of x*, but could 



be easily modified to handle all x*: see Sect. [3) In this case, every solution x of 
dH), 0 satisfies 

^(xe u G e) = 2 for all u in {1,2,..., k}, (10) 

Xe = 1 for all e such that e C (1, 2, . . . , fc} and x* = 1, (11) 



and so 

X* is a convex combination of tangled tours through V 

if only if it is a convex combination of tangled tours through V 

that satisfy (0, (HUD, (ITTIl . 

We will refer to tangled tours that satisfy 0, dTUTl . (fTTH as strongly constrained. 



Delayed Generation of Strongly Constrained Tangled Tours. Unfortu- 
nately, the strongly constrained tangled tours through V may be too numerous 
to be listed one by one; fortunately, x* can be tested for membership in the 
convex hull of the set of strongly constrained tangled tours through V with- 
out listing these tangled tours one by one. The trick, introduced by Ford and 
Fulkerson jUS] and by Jewell 00], is known as delayed column generation] we 
use it in Algorithm [UTl This algorithm returns either the message “x* is in the 
graphical traveling salesman polytope on V” or a vector a and a scalar b such 
that the inequality a^x < b is satisfied by all strongly constrained tangled tours 
X through V and violated by x* . 
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Algorithm 41. Testing the if eondition in Alaorithm\31\ 

if there is a strongly constrained tangled tour x through V 
then make x the only specimen in a collection of 

strongly constrained tangled tours through V ; 
repeat if some linear inequality a^x < b is 

satisfied by all x in the collection and violated by x* 
then find a strongly constrained tangled tour x through V 
that maximizes a^x; 
if a^x < b 
then return a and b; 
else add x to the collection; 
end 

else return the message 

“x* is in the graphical traveling salesman polytope on V”; 

end 

end 

else return 0 and —1; 

end 



In our illustrative example, Algorithm |4T] may initialize the collection by the 

• strongly constrained tangled tour 0-4-3-2-1-5-6-7-0 
and then proceed through the following five iterations: 

Iteration 1: Inequality 

-Ti5 < -1 

is satisfied by all x in the collection and violated by x* . The 

• strongly constrained tangled tour 0-1-2-0-4-3-5-6-7-0 
maximixes — T15. We add this tangled tour to the collection. 

Iteration 2: Inequality 

X25 < 0 

is satisfied by all x in the collection and violated by x* . The 

• strongly constrained tangled tour 0-1-5-2-0-4-3-6-7-0 
maximixes 3:25. We add this tangled tour to the collection. 

Iteration 3: Inequality 



-3:15 + X23 + 3:25 < 0 

is satisfied by all x in the collection and violated by x* . The 
• strongly constrained tangled tour 0-1-0-4-3-2-5-6-7-0 
maximixes — Tia + T23 + T25. We add this tangled tour to the collection. 

Iteration 4: Inequality 

3:47 < 0 
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is satisfied by all x in the collection and violated by x*. The 
• strongly constrained tangled tour 0-1-5-6-7-4-3-2-0 
maximixes X47. We add this tangled tour to the collection. 

Iteration 5; Inequality 



X \2 + X25 + X47 < 1 (12) 

is satisfied by all x in the collection and violated by x*. The 
• strongly constrained tangled tour 0-4-3-2-1-5-6-7-0 
maximixes X12 + X25 + 5^47- We conclude that m is satisfied by all strongly 
constrained tangled tours and violated by x* . 



Implementing Algorithm [41] Let 

£' 1/2 denote the set of all the edges e such that 
eC{l,2,...,fc},x:^0, xl^l. 

The significance of £1/2 comes from the fact that every strongly constrained 
tangled tour x satisfies 

xou = 2 — '^{xe : e C {1, 2, . . . , fc}, u € e) for all u in {1,2,..., k}, 

Xe = 0 for all e such that e C {1,2, ... ,k} and x* = 0, 

Xe = 1 for all e such that e C (1, 2, . . . , fc} and x* = 1, 

and so the condition 

some linear inequality a^x < 6 is 

satisfied by all x in the collection and violated by x* 

in Algorithm 141 1 is equivalent to the condition 

some linear inequality a^x < b with Oe = 0 whenever e ^ £4/2 is 
satisfied by all x in the collection and violated by x*. 

To test this condition, Concorde solves a linear programming problem. With 

ip(x) standing for the restriction of x 

on its components indexed by elements of £1/2, 

with A the matrix whose columns i/’(x) come from specimens x in the collection, 
and with e standing - as usual - for the vector [1,1,...,!]^ whose number of 
components is determined by the context, this problem in variables s. A, w reads 

maximize s 

subject to s'ip{x*) — AX + w =0, 

—s + e^X = 0, 



( 13 ) 
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Since its constraints can be satisfied by setting s = 0, A = 0, tc = 0, problem 
m either has an optimal solution or else it is unbounded. In the former case, 
simplex method applied to m finds also an optimal solution of its dual, 

minimize e^{u + v) 
subject to —a^A + be^ > 0, 

o^4’(a*) — & = 1, (14) 

a + u — V = 0, 
u>0, V >0, 

and this optimal solution provides the a and b of Algorithm [41] in fact, it 

. . a'^x* — b 

maximizes — — n 

l|a||i 

subject to the constraints that a^x < b for all x in the collection and that 
Oe = 0 whenever e ^ Ei/ 2 - In the latter case, is infeasible, and so no linear 
inequality is satisfied by all x in the collection and violated by x* . 

To find specimens x for the collection, we use a function Oracle that, given 
an integer vector c, returns either a strongly constrained tangled tour x through 
V that maximizes c^x or the message “infeasible” indicating that no tangled 
tour through V is strongly constrained. Concorde implements Oracle as two 
algorithms in tandem: if a primitive branch-and-bound algorithm fails to solve 
the instance within a prescribed number of recursive calls of itself, then we switch 
to a more sophisticated branch-and-cut algorithm. To reconcile 

• the integer arithmetic of Oracle, which uses a and b, 
with 

• the floating-point arithmetic of the simplex method, which finds a and b, 
Concorde uses the continued fraction method (see, for instance, Schrijver [HH]!: 
since Oracle uses integer arithmetic, the cut a^x < b that separates x* from 
all strongly constrained tangled tours through V has an integer a and an integer 
right-hand side b. 

4.2 Separating x* from the Graphical Traveling Salesman Polytope: 
The Three Phases 

If X* lies outside the graphical traveling salesman polytope, then Algorithm 2T] 
returns a linear inequality that separates x* from all strongly constrained tangled 
tours. To convert this inequality to a cut that separates x* from all tangled tours, 
we proceed in in three phases: 

in Phase 1, we find a linear inequality that 

separates x* from all moderately constrained tangled tours and 

induces a facet of the convex hull of these tangled tours, 

in Phase 2, we find a linear inequality that 

separates x* from all weakly constrained tangled tours and 

induces a facet of the convex hull of these tangled tours. 
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in Phase 3, we find a linear inequality that 
separates x* from all tangled tours and 

induces a facet of the convex hull of weakly constrained tangled tours. 

(In Phase 3, we could easily find a linear inequality that separates x* from all 
tangled tours and induces a facet of the convex hull of all tangled tours; however, 
such an inequality might not be acceptable to Concorde as a constraint of an 
LP relaxation of the TSP instance. We will discuss this point in Sect. 14. 51 1 
The intermediate classes of moderately constrained tangled tours and weakly 
constrained tangled tours are defined by reference to the sets of constraints 

(a) : u G e) =2 for all u in {1,2,..., k}; 

(b) Te = 0 for all e such that e C (1, 2, . . . , fc} and x* = 0, 

Xe = 1 for all e such that e C (1, 2, . . . , fc} and x* = 1; 

(c) xou = 0 for all u in (1,2,..., fc} such that S:g„ = 0. 

Specifically, 

strongly constrained tangled tours are those that satisfy (a), (b), (c); 
moderately constrained tangled tours are those that satisfy (a), (b); 
weakly constrained tangled tours are those that satisfy (a). 

The principal tools used in Phase 1 are a constructive proof of a a classic result 
of Minkowski (Theorem [T| and the Dinkelbach method of fractional program- 
ming; Phase 2 is sequential lifting; Phase 3 is nearly effortless. In Phase 1 
and Phase 2, we use a function Oracle that, 

given integer vectors c, £, u and a threshold t (either an integer or — oo), 
returns either 

a weakly constrained tangled tour x that maximizes c^x 
subject to £ < X < u, <Fx > t 
or 

the message “infeasible” indicating that 
no weakly constrained tangled tour x satisfies 
£ <x < u, c^x > t. 

This is the same function that is used, with a fixed £ and a fixed u, to find items 
X for the collection in Algorithm ST] 

4.3 Phase 1: From Strongly Constrained to Moderately Constrained 
Tangled Tours 

Separating x* from Moderately Constrained Tangled Tours. It is easy 
to convert any linear inequality that separates x* from all strongly constrained 
tangled tours to a linear inequality that separates x* from all moderately con- 
strained tangled tours. In our example, inequality 
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is satisfied by all strongly constrained tangled tours x and violated by x*] if a; 
is a moderately constrained tangled tour, then '0(3;) is a zero-one vector, and so 

Xl2 + X25 + X47 < 3; 

a moderately constrained tangled tour x is strongly constrained if and only if 

xo3 = 0, xo5 = 0, Toe = 0. 

It follows that the inequality 

Ti 2 -h T25 + T47 - 2 (To 3 + To 5 + Toe) < 1 

is satisfied by all moderately constrained tangled tours T and violated by T* . In 
general, a moderately constrained tangled tour T is strongly constrained if and 
only if 

xqu = 0 for all It in {1,2 ,..., k} such that Tq^ = 0; 
if 

p^ip{x) < r 

is satisfied by all strongly constrained tangled tours T and violated by T*, then 

p^ip{x) - (Iblli -r)X;(To^ : = 0) < r (15) 

is satisfied by all moderately constrained tangled tours T and violated by T* (to 
see that ||p||i — r is always positive, note that r < p^ijj^x*) < ||p||i); the bulk of 
Phase 1 is taken up by transforming this cut into a cut that induces a facet of 
the convex hull of all moderately constrained tangled tours. 

Dimension of the Set of Moderately Constrained Tangled Tours. In 
Sect. 14.11 we defined 

Ei /2 = {e: e C (1, 2, . . . , A:}, T* yf 0, T* ^ 1}. 

In fact, we may assume that 

Ei /2 = {e:eC|l,2,...,fc},0<T:<l}: 

otherwise T* > 1 for some e such that eC {l,2,...,fc}, and so T* is separated 
from all tangled tours by the subtour inequality T(e, P — e) > 2. (As noted by 
Cornuejols, Fonlupt, and Naddef m, subtour inequalities induce facets of the 
graphical traveling salesman polytope.) In addition, 

with ip{x) standing for the restriction of T on E 1/2 just as in Sect. l4.il 

we may assume that each of the vectors 
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with components indexed by elements of if 1/2 equals ipix) for some moderately 
constrained tangled tour x (else x* is separated from T by another readily avail- 
able subtour inequality), and so 

{ip(x) : X is a moderately constrained tangled tour} has full dimension. 

Since every moderately constrained tangled tour x satisfies 

xou = 2 — J^(xe : e C {1,2, . . . , k}, u € e) for all u in {1,2,..., k}, 

Xe = 0 for all e such that e C [1,2, ... ,k} and x* = 0, 

Xe = 1 for all e such that e C {1,2, ... ,k} and x* = 1, 

we conclude that 

the set of moderately constrained tangled tours has dimension |ifi/ 2 |- 



Prom a Cut to a Facet-Inducing Cut: An Overview. The optimal basis 
of problem (El provides an affinely independent set I of strongly constrained 
tangled tours x that satisfy (1151 with the sign of equality. In our illustrative 
example, I consists of 

• the strongly constrained tangled tour 0-4-3- 2-1-5-6-7-0, 

• the strongly constrained tangled tour 0-1-2-0-4-3-5-6-7-0, 

• the strongly constrained tangled tour 0-1-5- 2-0-4-3-6-7-0, 

• the strongly constrained tangled tour 0-1-0-4-3-2-5-6-7-0, 

• the strongly constrained tangled tour 0-1-5-6-7-4-3-2-0. 

In general, it may turn out that 1 = 0; this happens if and only if no tangled 
tour through V is strongly constrained, in which case (El reads 

-X;(So« : = 0) < -1. 

Writing a^x < b for (El. we have an integer vector a, an integer b, and a 
(possibly empty) set I such that 

(InvI) all moderately constrained tangled tours x have a^x < b, 

(Inv2) I is an affinely independent set 

of moderately constrained tangled tours, 

(Inv3) a^x = b whenever x G I, 

(Inv 4) a^x* > b. 

Concorde maintains these four invariants while adding new elements to I and 
adjusting a and b if necessary: when \X\ reaches |£’i/ 2 |. the current cut a^x < b 
induces a facet of the convex hull of all moderately constrained tangled tours. 
An outline of this process is provided in Algorithm 1421 
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Algorithm 42. From a cut to a facet-inducing cut in Phase 1 : 
while \X\ < \Ei/ 2 \ 

do find an integer vector v, an integer w, and 

a moderately constrained tangled tour such that 
(Al) = w whenever x G I, 

(A2) some moderately constrained tangled tour x has v’^x ^ w, 
and either 

(A3.1) v'^x* > w and v'^x > w for all moderately constrained 
tangled tours 

or else 

(A3. 2) < b and v'^x^ = w; 

find an integer vector a' , an integer b' , and 
a moderately constrained tangled tour x' such that 

(Bl) all moderately constrained tangled tours x have < b', 
(B2) equation a'"^x = b' is a, linear combination of 
a?"x = b and v’^x = w, 

(B3) a'^x' = b' and {aFx',v^x') yf {b,w), 

(B4) a'^x* > b'; 
a = a',b=b',I = lU {x'}; 

end 

return a and b; 



In our illustrative example, the initial a^x < b reads 

-2xq3 - 2X05 - 2Xq 6 + Xi2 + X25 + X47 < 1 

and Algorithm S21 may proceed as follows. 

Iteration I: v'^x = X03, w = 0, and x° is an arbitrary moderately con- 
strained tangled tour. We leave a^x < b unchanged and we add to I 
• the moderately constrained tangled tour 0-1-2-5-6-7-4-3-0. 

Iteration 2; x^x = X05, w = 0, and x° is an arbitrary moderately con- 
strained tangled tour. We replace a^x <bhy 



- 2 x 03 - a;o5 - 2 xoo + X12 + X25 + X47 < 1 



(16) 



and we add to I 

• the moderately constrained tangled tour 0-1-2-5-0-4-3-6-7-0. 

Iteration 3: v'^x = xoe, w = 0, and x° is an arbitrary moderately con- 
strained tangled tour. We leave a’^x < b unchanged and we add to I 

• the moderately constrained tangled tour 0-1-2-5-3-4-7-6-0. 

Now \X\ has reached lifi/21, and so (TTOll induces a facet of the convex hull of 
moderately constrained tangled tours. 



278 D. Applegate et al. 



Finding v, w, and x° in Algorithm 1421 In early iterations of the while 
loop in Algorithm |42] Concorde draws v and w from a catalog of inequalities 
v'^x > w that satisfy (A3.1) and (A2); any of these inequalities that happens to 
satisfy (Al) can provide the v and the w for use, with an arbitrary moderately 

constrained tangled tour x^, in the next iteration of the while loop. The catalog 

consists of 

all inequalities Xe >0 such that e G E1/2, 
all inequalities —Xg > — 1 such that e G i?i/2, 
all inequalities xqu > 0 such that u £ { 1 , 2 , . . . , k}; 

for all M in {1, 2 , . . . , fc}, Concorde’s way of shrinking x* into x* - whose descrip- 
tion is deferred to Sect. |5]- guarantees the existence of a moderately constrained 
tangled tour x such that xqu > 0. 

If no inequality v'^x > w in the catalog satisfies (Al) and yet \X\ < |i?i/2|, 
then Concorde finds w as a nonzero solution of the system 

v^x = 0 for all x in X, (17) 

t>e = 0 for all e outside ifi/2, (18) 

it sets w = 0 , and it lets x^ be the moderately constrained tangled tour such 
that = 0 for all e in if 1/2. 

For each e in E1/2, let x^ denote the moderately constrained tangled tour 
such that xl = 1 and a;^ = 0 for all other / in if 1/2. Property (II8II guarantees 
that = Vg for all e in if 1/2! since u is a nonzero vector with property (1181) . 
at least one e in if 1/2 has Vg fy 0; it follows that (A2) holds. 

To see that < b, observe that 

T,{x*gX^ : e G E1/2) + (1 - T,iK ■ e e E1/2)) x° = x* , 

and so 

(1 - E(^e : e G £'1/2)) = a^x* - Y,{x*g0^x^ : e G £1/2) 

> & (1 — Y^{x*g : e G £1/2)). 

To see that v^x^ = 0, note that UeX° = 0 for all e. 

Finding o', b', and x' in Algorithm 1421 To add new elements to X and to 
adjust a and b if necessary, Concorde’s implementation of Algorithm ^ uses a 
function Tilt, which, given integer vectors a, v, integers b, w, and a moderately 
constrained tangled tour xf* with the property 

if all moderately constrained tangled tours x have v'^x < w, 
then a^xf' < b and v"'"x^ = w, 

returns a nonzero integer vector a' , an integer b' , and a moderately constrained 
tangled tour x' with properties (Bl), 
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(B2“*") inequality < 5' is a nonnegative linear combination of 
a^x < b and v'^x < w, 



and (B3). 

In the iterations of the while loop in Algorithm 021 where v and w are drawn 
from the catalog, Concorde calls Tilt {a,b,v,w,x°) for {a',b',x'). To see that 
a'^x* > b', note that there are nonnegative A and /i such that a' = Xa + fiv, 
b' = Xb+ fiw; that v'^x* > w; and that A > 0 since a'^x < b' is satisfied by all 
moderately constrained tangled tours but v'^x < w is not. 

In the iterations of the while loop in Algorithm 02] where w = 0 and v is 
computed by solving the system (fT71) . ([TH|, Concorde computes 

(a'*', x“*") = Tilt (a, 6, ti,0,x°), 

{a~ ,b~ ,x~) = Tilt (a, b, —v, 0,x^) 

and then it sets 

a' = a+, b' = b+, x' = x+ if o+^x* - b+ > a~'^x* - b~, 

a' = a~ , b' = b~ ,x' = x~ otherwise. 

To show that a'^x* > 6', we are going to prove that 

inequality a^x < 6 is a nonnegative linear combination of 
a^'^x < h'^ and a~^x < b~ . 

There are nonnegative numbers A'*' and /i'*' such that 

a+ = X+a + n+v, 6+ = A+6 

and there are nonnegative numbers A“ and such that 

a~ = X~a — n~v, b~ = X~b. 

Since a'^^x < b~^ and a~’^x < b~ for all moderately constrained tangled tours x 
and since some moderately constrained tangled tour x has v'^x ^ 0, 

at least one of A+, A“ is positive. 

If = 0, then A+ > 0, and so oFx < b is a. positive multiple of a+^x < if 

/i“ = 0, then A“ > 0 , and so a^x < 6 is a positive multiple of a“^x < if 

> 0 and /r“ > 0, then A+^“ + A“^+ >0 and 

- + 

° A+^“ + A“^+” X+fi- + X~fi+^ ’ 

A+^“ + A“^+ A+^“ + X~fj,+ 
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Implementation of Tilt. Tilt could be implemented by the Dinkelbach 
method of fractional programming (see, for instance, Sect. 5.6 of Craven |15] 
or Sect. 4.5 of Stancu-Minasian |^) as in Algorithm [43l 

Algorithm 43. Tilt (a, 6, u, w, 3;°) .• 

X = moderately constrained tangled tour that maximizes v'^x; 

A = v'^x — w, fi = b — a^x; 
if A = 0 
then return (u, u>, 5;*^); 
else if /r = 0 

then return {a,b,x)] 

else return Tilt (a, b, Xa + fxv, Xb + /iw, x); 

end 

end 

For illustration, consider Iteration 2 in our example. Here, 

Oo 3 = —2, ao 5 = —2, ao6 = —2, Oi2 = 1, 025 = 1, 047 = 1, 

and aij = 0 for all other i and j such that 0 < i < j < 7; we have 6=1; 



1^05 = 1 



and Vij = 0 for all other i and j such that 0 < z < j < 7; we have w = 0; vector 
x^ may be any moderately constrained tangled tour. 

In the nested recursive calls of Tilt (a, 6, v, w,x), values of a and 6 do not 
change, but values of v and w do; in our illustration, we may specify each new 
V as 

[^’03, U06, Vi2, V25, V4 t]'^ 

since Vij = 0 for all other i and j such that 0 < z < j < 7. In this notation, a 
record of the computations may go as follows. 

Tilt (a, 6, [0, 1, 0, 0, 0, 0]^, 0,x°): 

the X returned by Oracle is 0-1-0- 2-0-3-4-0-5-0-6-7-0; 

A = 2, /i = 9; 

Tilt (a, 6, [-4, 5, -4, 2, 2, 2]^, 2, x): 

the X returned by Oracle is 0-1-2-0-4-3-6-7-0-5-0; 

A = 10, /i = 4; 

Tilt (a, 6, [-36, 0, -36, 18, 18, 18]^, 18, x): 

the X returned by Oracle is 0-1-2-5-0-4-3-6-7-0; 

A = 18, /z = 1; 

Tilt (a, 6, [-72, -36, -72, 36, 36, 36]"^, 36, x): 

the X returned by Oracle is 0-1-2-5-0-4-3-6-7-0; 

A = 0, p, = 1; 

return ([-72,-36,-72,36,36,36]"^, 36,0-1-2-5-0-4-3-6-7-0); 
return ([-72,-36,-72,36,36,36]"^, 36,0-1-2-5-0-4-3-6-7-0); 
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return ([-72, -36, -72, 36, 36, 36]^, 36, 0-1-2-5-0-4-3-6-7-0); 
return ([-72,-36,-72,36,36,36]'^, 36,0-1-2-5-0-4-3-6-7-0); 



Concorde implements Tilt as a modified version of Algorithm 021 before 
each recursive call of Tilt (a,b,v,w,x), it divides w and all the components of 
V by their greatest common divisor. This policy makes 

Tilt (a, 5, [0, 1, 0, 0, 0, 0]^, 0,T°) 

Tilt (a, 6, [-4, 5, -4, 2, 2, 2]^, 2, T) 

Tilt (a, 6, [-2, 0, -2, 1, 1, 1]^, l,x) 

Tilt (a,b, [-2, -1, -2, 1, 1, 1]^, 1,T) 

the four invocations of Tilt in our example and it makes 

([-2, -1,-2, 1,1,1]^, 1, 0-1-2-5-0-4-3-6-7-0) 

their common return value. 

4.4 Phase 2: Prom Moderately Constrained to Weakly Constrained 
Tangled Tours 

We will write 



Eo = {e: eC{l,2,...,fc},T: = 0}, 

Pi = {e: eC{l,2,...,fc},T: = l}; 

in this notation, a weakly constrained tangled tour x is moderately constrained 
if and only if 

Xe = 0 whenever e € Eq and Te = 1 whenever e C Pi. 

The linear inequality a^x < b constructed in Phase 1 separates x* from all 
moderately constrained tangled tours and induces a a facet of their convex hull; 
in Phase 2, we find integers Ae {e G EqU Ei) such that the inequality 

a^x + J^(AgXe : e G Eo U El) < b + J^(Ag : e G E{) 

separates x* from all weakly constrained tangled tours and induces a a facet 
of their convex hull. A way of computing the A^ one by one originated in the 
work of Gomory (1969) and was elaborated by Balas (1975), Hammer, John- 
son, and Peled (1975), Padberg (1973,1975), Wolsey (1975a, 1975b), and others; 
it is known as sequential lifting; its application in our context is described in 
Algorithm Both while loops in this algorithm maintain the invariant 

a^x < b induces a facet of the convex hull of 
all weakly constrained tangled tours x such that 
Xe = 0 whenever e G Fq and Tg = 1 whenever e G Fi. 



( 19 ) 
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Algorithm 44. Sequential lifting 

Fq = Eq , Fi = El ; 

while Fi ^ 0 

do / = an edge in Fi; 

find a weakly constrained tangled tour that 
maximizes a^x subject to 
Xe = 0 whenever e G Fq U {/}, 

Xe = 1 whenever e G Fi — {/}; 
replace a^x < b by a'^x + ~b)xf< 

delete / from Fi; 

end 

while Fq 0 

do / = an edge in Fq; 

find a weakly constrained tangled tour that 
maximizes subject to 
Xe = 0 whenever e G Fq — {/}, 

^/ = 1 ; 

replace a^x < 6 by a^x + {b— a^x™“)x/ < 6; 
delete / from Fq; 

end 



Concorde enters Phase 2 with an inequality a^x < b such that 

(i) a^x < b induces a facet of the convex hull of 
all moderately constrained tangled tours, 

(ii) a^x* > b, 

(iii) Oe = 0 for all e outside Ei/ 2 - 

An arbitrary inequality a^x < b with properties (i), (ii) can be made to satisfy 
(iii) as well by first substituting 2 — ^(xg : 0 ^ e, u G e) for all xo„ and then 
substituting 0 for all Xg such that e G Eq and substituting 1 for all Xg such that 
e G Fi. In our illustrative example, these substitutions convert inequality 

~2xq3 — Xq5 — 2 xq 6 + Xi2 + X25 + X47 < 1 

with properties (i), (ii) to inequality 

X12 + Xi5 + 2X23 + 2X25 + 3X35 + 4X36 + ^47 + 3X56 < 7 

with properties (i), (ii), and (iii). 

Concorde implements Phase 2 as a streamlined version of Algorithmic to 
describe this version, we will write 

= {l,2,...,fc} and F“* = {e : e C |e| = 2}. 

Since every weakly constrained tangled tour x satisfies 

xqu = 2 — '^{xe : e G F , u G e) for all u in P , 
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it is determined by its restriction on Restrictions of weakly constrained 
tangled tours on E are precisely the incidence vectors of the edge-sets of path 
systems - meaning graphs whose connected components are paths - with vertex- 
set V . The set of all path systems with vertex-set V is monotone in the sense 
that the removal of an edge from a path system yields another path system. 

Monotonicity of the set of all path systems with vertex-set V implies that, 
for all choices of subsets Bq, Ri of E and for all choices of objective functions 
a^x such that 

tte = 0 for all e outside E 

(this property of a is maintained by Algorithm 1441) . the problem of finding a 
weakly constrained tangled tour that 

maximizes a^x subject to 
Xe = 0 whenever e € Rq, 

Xe = 1 whenever e € Ri 

either has no feasible solution at all or else has an optimal solution such that 

Xe = 0 whenever e G E™^—{Rq U Ri) and Og = 0. 

Concorde exploits this observation in a couple of ways: it makes the job of 
Oracle easier by adding constraint 

Xe = 0 whenever e £ A™*— U Fi) and Oe = 0 

to the problem of finding and it skips certain calls of Oracle altogether. 

The second of these two tricks begins with the set I of |Fi/ 2 1 moderately con- 
strained tangled tours produced in Phase 1: for each element x' of I, Concorde 
deletes from Eq all the edges uv such that u and v are endpoints of distinct paths 
in the path system defined by x'. To see that each of these deletions preserves 
invariant m, consider the weakly constrained tangled tour x" that 

maximizes a^x subject to 
Xe = 0 whenever e € Fq — {rtx}, 

Xe = 1 whenever e G Fi U {uv}. 

On the one hand, the path system defined by x' with edge uv added shows that 
a’^x” > b; on the other hand, the path system defined by x" with edge uv deleted 
shows that a^x” < b; we conclude that a^x” = b, and so the deletion of uv from 
Fq preserves invariant dH). 

This trick applies not only to the elements of I, but also to each weakly 
tangled tour found by Oracle in Phase 2: as soon as it finds a new 
Concorde deletes from Fq all the edges uv such that u and v are endpoints of 
distinct paths in the path system defined by 

In our illustrative example, we begin with 

Fq = {13, 14, 16, 17, 24, 26, 27, 37, 45, 46, 57}, Fi = (34, 67} 
and then we examine the eight path systems defined by I: 
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• the path system 4-3-2-1-5-6-7 is a single path, 

• the path system with components 1-2 and 4-3-5-6-7 
eliminates edges 14, 17, 24, 27 from Fq, 

• the path system with components 1-5-2 and 4-3-6-7 
eliminates edges 14, 17, 24, 27 from Fg, 

• the path system with components 1 and 4-3-2-5-6-7 
eliminates edges 14, 17 from Fq, 

• the path system 1-5-6-7-4-3-2 is a single path, 

• the path system 1-2-5-6-7-4-3 is a single path, 

• the path system with components 1-2-5 and 4-3-6-7 
eliminates edges 14, 17, 45, 57 from Fg, 

• the path system 1-2-5-3-4-7-6 is a single path. 

The initial < b reads 

Xi 2 + Xi5 -|- 2x23 + 2x25 + 33^35 + 4x3g + X47 + < 7. (20) 

The first while loop of Algorithm HU may go as follows. 

Iteration 1; Fg = {13,16,26,37,46}, Fi = {34,67}, / = 34. 

Oracle, called to find a weakly constrained tangled tour that maximizes 
the left-hand side a^x of (BIHI subject to 

Xl3 = Xi4 = Xi6 = Xi7 = X24 = 3^26 = ^27 = X37 = X45 = X46 = X57 = 0, 

X 34 = 0, and Xq 7 = 1, 
returns the represented by 

• the path system with the single component 1-2-5-3-6-7-4; 
since a^x™^^ = 11, we replace (1^ by 

Xi2 -|- Xi5 -|- 2x23 + 2x25 + 4X34 -|- 3X35 + 4x3g -|- X47 -|- 3x5g < 11. (21) 

Iteration 2; Fg = {13, 16, 26, 37, 46}, Fi = {67}, / = 67. 

Oracle, called to find a weakly constrained tangled tour that maximizes 
the left-hand side a^x of (1^ subject to 

Xl3 = Xi 4 = Xi6 = Xi7 = X24 = X26 = X27 = X37 = X45 = X46 = X57 = 0, 

X67 = 0, 

returns the represented by 

• the path system with the single component 1-2-5-6-3-4-7; 

since = 15, we replace (1^ by 

^12 + ^15 + 2X23 + 2X25 + 4x34 -|- 3X35 + 4X36 + X 47 + 3X56 + 4X67 < 15. (22) 



The second while loop of Algorithm |H] may go as follows. 

Iteration 1: Fg = {13,16,26,37,46}, Fi = 0, / = 13. 

Oracle, called to find a weakly constrained tangled tour that maximizes 
the left-hand side a^x of (12^ subject to 
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Xi4 = Xi6 = Xi7 = X 24 = X2Q = X 27 = X37 = CC45 = X4Q = 0:57 = 0, 

Xl3 = 1 , 

returns the represented by 

• the path system with the single component 1-3-4-7-6-5-2; 

since = 14, we replace (l22ll by 

3^12 + 3^13 + 3^15 + 2 X 23 + 2 X 25 + 4X34 

+ 3X35 + 4X36 + X 47 + 3X56 + 4+67 < 15. (23) 

Iteration 2; Fq = {16,26,37,46}, Fi = 0, / = 16. 

Oracle, called to find a weakly constrained tangled tour that maximizes 
the left-hand side a^x of (l23ll subject to 

Xi4 = Xi7 = X24 = X26 = 3^27 = X37 = X45 = X46 = X57 = 0, 

Xl5 = 1 , 

returns the represented by 

• the path system with the single component 1-6-7-4-3-5-2; 

since = 14, we replace (1^ by 

X 12 + Xi3 + Xi5 + X16 + 2X23 + 2X25 + 4X34 

+ 3X35 + 4X36 + 3^47 + 3X56 + 4X67 ^ 15. (24) 

Iteration 3: +o = {26,37,46}, +i = 0, / = 26. 

Oracle, called to find a weakly constrained tangled tour that maximizes 
the left-hand side a^x of (IMl) subject to 

Xi4 = Xi7 = X24 = X27 = X37 = X45 = X46 = X57 = 0, 

^26 = 1, 

returns the represented by 

• the path system with the single component 1-5-3-4-7-6-2; 

since = 13, we replace (1211) by 



X12 + X13 + X15 + X16 + 2x23 + 2x25 + 2x26 + 4x34 

+ 3X35 + 4X36 + ^47 + 3+56 + 4+67 < 15. (25) 

Iteration 4: +o = {37,46}, +i = 0, / = 37. 

Oracle, called to find a weakly constrained tangled tour that maximizes 
the left-hand side a^x of (l25l) subject to 



Xi4 = Xi7 = X24 = X27 = X45 = X46 = X57 = 0, 
+37 = 1, 
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returns the represented by 

• the path system with the single component 1-2-5-6-7-3-4; 

since = 14, we replace (1^ by 

X\2 + Xi3 + Xi^ + X\Q + 2x2‘i + 2x25 + 2X26 + 4X34 

+3X35 + 4X36 + +37 + +47 + 3+56 + 4+67 < 15- (26) 

Iteration 5: Fq = {46}, Fi = 0, / = 46. 

Oracle, called to find a weakly constrained tangled tour that maximizes 
the left-hand side a^x of (IM|l subject to 

Xi4 = Xi7 = X24 = X27 = X45 = X57 = 0, 

3^46 = 1) 

returns the represented by 

• the path system with the single component 1-2-5-3-4-6-7; 
since a^x™“ = 14, we replace (1^ by 

Xi2 + Xi3 + Xi5 + Xi6 + 2x23 + 2x25 + 2x26 + 4x34 

+3X35 + 4X36 + ^37 + 2^46 4 — 1“2'47 + 3X56 + 4X67 ^ 15. (27) 

After this iteration, Fq = Fi = 0, and so l27ll induces a facet of the convex 
hull of all weakly constrained tangled tours. 

4.5 Phase 3: From Weakly Constrained to All Tangled Tours 

Phase 2 produces an inequality a^x < b such that 

(i) a^x < b induces a facet of the convex hull of 
all weakly constrained tangled tours, 

(ii) a^x* > b, 

(iii) Oe = 0 for all e outside E ; 

since the set of restrictions of weakly constrained tangled tours onto F’”* is 
monotone, (i), (ii), and (iii) imply that 

(iv) Oe > 0 for all e in F’"*. 

A tangled tour x is weakly constrained if and only if 

^(xe : e G w G e) = 2 for all w in P’"*; 
if a hypergraph inequality 



J2{XQx{Q,V-Q):Qen)>(3 



(28) 
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is related to a^x < b in the sense that, for some numbers Tr^{w € y*"*) , the 
left-hand side of (1281) is identically equal to 

: w £ e) : w £ F’"*) — 2a^x 
and the right-hand side of ( BHD equals 

: w £ F“‘) - 2b, 

then (EHI) induces a facet of the convex hull of all weakly constrained tangled 
tours and is violated by x* . In Phase 3, we find a hypergraph inequality that is 
related to < 6 in this sense and is satisfied by all tangled tours. Algorithm 
accomplishes this objective with a negligible amount of computations. 

Algorithm 45. Phase 3; 

construct a hypergraph TL on P’"* and positive integers Xq{Q £ TL) 
such that the linear form 

J2XQ{j:{xe-.eCQ)-.Q£n) 
is identically equal to a^x; 
return the inequality 

J:{Xqx{Q,V- Q)-.Q£H)> E(2Aq|Q| ■.Q£H)-2b- 



Arguments used by Naddef and Rinaldi PI show that the hypergraph inequality 
returned by Algorithm 03] is satisfied by all tangled tours; it is a routine matter 
to verify that the numbers defined by tt^ = '^{Xq : w £ Q,Q £ H) have the 
desired properties. 

A straightforward way of constructing the H and the Xq{Q £ %) in Algo- 
rithm 0H] is to let TL consist of all two-point subsets e such that Oe > 0 and to set 
Ae = Oe for all e in TL. Instead, Concorde constructs TL and Xq{Q £ TL) by an 
iterative greedy procedure: in each iteration, it chooses a maximal (with respect 
to set-inclusion) subset Q of P™* such that Oe > 0 for all edges e with both 
endpoints in Q, it lets Ag be the smallest of these positive Og, it brings Q into 
TL, and it subtracts Ag from all Og such that e has both endpoints in Q. 

In our illustrative example, the inequality a^x < b produced in Phase 2 
reads 



Xi2 + Xi3 + x\3 + xie + 2a;23 + 2a;25 + 2^26 + 4x34 

+3X35 + 4X36 + ^37 + ^46 + +^47 + 3X56 + 4+67 < 15; 



we construct the TL and the Ag(Q £ TL) in Algorithm l45l as follows. 

1. We choose Q = {1,2,^, 5, 6}, which yields Ag = 1 and leaves us with 

a^x = +23 + +25 + X2Q + 4+34 + 2+35 + 3+36 + ®37 + ^46 + ^47 + 2+56 + 4+67. 



288 D. Applegate et al. 



2. We choose Q = {2, 3, 5, 6}, which yields Aq = 1 and leaves us with 

a^x = 4X34 + X35 + 2X36 + Xsr + ^46 + X 47 + X56 + 4X67- 

3. We choose Q = {3, 4, 6, 7}, which yields Aq = 1 and leaves us with 

a^x = 3X34 + X35 + X36 + X56 + 3X67- 

4. We choose Q = {3,4}, which yields Aq = 3 and leaves us with 

a^x = X35 + X36 + X56 + 3X67- 

5. We choose Q = {3,5,6}, which yields Aq = 1 and leaves us with 

a^x = 3x67- 

6. We choose Q = {6,7}, which yields Aq = 3 and leaves us with 

a^x = 0. 

The resulting hypergraph inequality, 

x({l,2,3,5,6},{0,4,7}) 

+ x({2,3,5,6},{0,l,4,7}) 

+ x({3,4,6,7},{0,l,2,5}) 

+ 3x({3,4},{0,l,2,5,6,7}) 

+ x({3,5,6},{0,l,2,4,7}) 

+ 3x({6,7},{0,l,2,3,4,5}) >26, 

induces a facet of the convex hull of all weakly constrained tangled tours and is 
violated by x* . (By the way, this inequality belongs to the class of path inequal- 
ities of Cornuejols, Fonlupt, and Naddef (1985).) 

Every cut produced by Algorithm |45] can be easily transformed into a cut 
that induces a facet of the graphical traveling salesman polytope. It can be shown 
that, in case the a'^x < b produced in Phase 2 is such that 

every Ug with e £ is a positive integer, (29) 

this a^x < b reads 

A^(xe:eGE“‘)< A(/fc-l) 

for some positive A, and so Algorithm UHl returns a positive multiple of the 
subtour inequality 

x({0},F“)>2; 

as noted in Sect. 14.31 Cornuejols, Fonlupt, and Naddef [H] pointed out that 
subtour inequalities induce facets of the convex hull of 7~. It can also be shown 
that, in case (129} fails, the following procedure transforms the cut 

J:{Xqx{Q,V - Q) : Q gH) > p 

produced by Algorithm into a cut that induces a facet of the graphical trav- 
eling salesman poly tope: 
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Step 1. For all choices of distinct points u, v, w of V, define 

t{u,v,w) = ■ Q € Q,v € Q,w ^ Q)+ 

J2i^Q ■Q&'H,u^Q,v^Q,wGQ). 

Step 2. For all points w of y*"*, evaluate 

Auj = min{r(u, v,w) : uv € E,u ^ w,v ^ tc}. 

Step 3. Return the inequality 



X)(Aqx((3, V -Q) : Q GH) - V - {tc}) : w G y'”*) > 

/3-2E(Ai™:^TGF”‘). 

Again, the arguments come from Naddef and Rinaldi |53J : in their terminology, 
the inequality returned in Step 3 is tight triangular. 

The current version of Concorde can handle only hypergraph constraints 
with nonnegative coefficients; it settles for the cut produced by Algorithm 23 
even when this cut does not induce a facet of the graphical traveling salesman 
polytope. Concorde’s way of choosing H and Xq{Q G 'H) aims to mitigate the 
effects of this carelessness by reducing the number of points w such that > 0. 

In our illustrative example, TL consists of 

{1,2,3,5,6}, {2,3,5,6}, {3, 4,6,7}, {3,4}, {3,5,6}, {6,7}; 

since no member of TL contains both 1 and 4, assumption lf29ll is satisfied; since 



Ai = r(0,3,l) = 0, 

A2 = t(0,3,2) = 0, 

A3 = t(4,5,3) = 0, 

A4 = t(0,3,4) = 0, 

A5 = t(0,3,5) = 0, 

A6 = t(3,7,6) = 0, 

A7 = t(0,6,7) = 0, 

we have Ay, = 0 for all w = 1,2, ... ,7. It follows that this path inequality is 

tight triangular and induces a facet of the graphical traveling salesman polytope. 



5 Making Choices of Vq, V}, . . . , T4 

Concorde’s choices of Fq, F, . . . , F in Algorithm ITTI are guided by x* in a way 
similar to that used by Christof and Reinelt m in their algorithm for finding 
cuts that match templates from a prescribed large catalog. First, it constructs 
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once and for all an equivalence relation on V in such a way that each equivalence 
class V* of this relation satisfies 

x*{V*,V -V*) = 2; 

then it makes many different choices of Vb, Vi, . . . , in such a way that each of 
Vi, . . . , 14 is one of these equivalence classes and Vb = F — (Vi U . . . U 14). 

With W standing for the set of the equivalence classes on V, the first stage 
amounts to preshrinking V onto W ; making each of the many different choices 
of Vb, Vi, . . . , 14 in the second stage means choosing a small subset of W and 
shrinking the entire remainder of W onto a single point. In terms of the preshrunk 
set W, each choice of Vbj kb, . ■ . , 14 in the second stage zooms in onto a relatively 
small part of the problem - typically k is at most thirty or so and \W\ may run 
to hundreds or thousands - and effectively discards the rest. For this reason, 
we developed the habit of referring to the cuts produced by Algorithm [31] as 
local cuts and referred to them by this name in Applegate et al. [^. In terms 
of the original V, each of the sets Vi, ... ,Vk could be quite large, which makes 
the qualifier “local” a misnomer. Still, a crisp label for the cuts produced by 
AlgorithmEHis convenient to have and “local cuts” seems to be as good a name 
as any other that we managed to think up. 

The equivalence relation is constructed by iteratively shrinking two-point 
sets into a single point. At each stage of this process, we have a set W and a 
mapping tt : W —>■ 2^ that defines a partition of V into pairwise disjoint subsets 
7 t(w;) with w €W. Initially, W = V and each tt{w) is the singleton {ic}; as long 
as there are distinct elements u, v,w oiW such that 

x*(7t(u), 7t(u)) = 1 and x*(7r(tt), 7r(r(;)) -|- a:*(7r(u), 7r(i(;)) = 1, (30) 

we keep replacing tt{u) by 7r(M)U7r(u) and removing v from IF; when there are no 
u, u, w with property (lirnn . we stop. (During this process, we may discover pairs 
rt,u with x*(7r(u),7r(v)) > 1, in which case x* violates the subtour inequality 
x(Q, V — Q) > 2 with Q = 7t(u) U 7t(u).) 

To make the many different choices of Fi, . . . , Vb, we first set the value of a 
parameter t that nearly determines the value of k in the sense that t — S<k<t. 
Then, for each w in W, we choose a subset C of IF so that w G C and t — 3 < 
\C\ < t] the corresponding Fi, . . . , Ffc are the tt{v) with v G C. The choice of C is 
guided by the graph with vertex-set IF, where u and v are adjacent if and only if 
x*{'k{u),tt{v)) > e for some prescribed zero tolerance e: starting at w, we carry 
out a breadth-first search through this graph, until we collect a set C of t — 3 
vertices. If there are any vertices u outside this C such that x*{tt{u),tt{v)) = 1 
for some v in C, then we keep adding these vertices rt to C as long as \C\ < t. 

It seems plausible that such a crude way of choosing C can be improved. 
However, we found its performance satisfactory; none of the alternatives that we 
tried appeared to work better. 
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6 Experimental Findings 

Reinelt |M] created a library named TSPLIB that contains sample instances of 
the TSP (and related problems) from various sources and of various types. There 
are 110 TSP instances in this library. Some of them arise from the task of drilling 
holes in printed circuit boards and others have been constructed artificially, often 
in the Dantzig-Fulkerson-Johnson tradition of choosing a set of actual cities and 
defining the cost of travel from X to Y as the distance between X and Y. None 
of them (with a single exception, the problem named ts225) is contrived to be 
hard and none of them is contrived to be easy; 106 of them have been solved 
and four have not. 

The results reported in this section involve various TSP instances drawn from 
TSPLIB. The default code is Concorde 99.12.15 of Applegate et al. with 99 
as the random number seed (concorde -s 99 xxx.tsp). The running times are 
given in seconds on a Compaq XPIOOO workstation with a 500 MHz EV6 Alpha 
processor. 



6.1 The Easier TSPLIB Instances 

Our default code solved 87 of the 110 TSPLIB instances in under 1000 seconds. 
These instances are listed in Table 1. 

In Sect. El we described a way of producing a number of partititions of V 
into nonempty sets Vq: Ei, . . . , T4 with t — 3 < k < t for a prescribed positive 
integer t. Concorde uses it with t ranging between 8 and a prescribed integer 
tmax- More precisely, the search always begins with t = 8. Whenever a value 
of t is set, Concorde adds all the resulting cuts produced by Algorithm ET1 to 
the LP relaxation of our problem and it solves the tightened LP relaxation; if 
the increase in the value of the relaxation is not satisfactory and t < tmaxj then 
the next iteration takes place with t incremented by one. The default setting, 
tmax = 16, was prompted by results reported in Table 2. The trends exhibited 
in this table are hardly surprising: increases of fmax yield tighter LP relaxations, 
but they also require additional time to construct these tighter relaxations. 

Table 2 shows also that the total time to solve the 87 TSPLIB instances in 
Table 1 by our default code would increase to 126.8% of its original value if local 
cuts were turned off. This endorsement of local cuts pales in comparison with 
other cutting-plane routines: if we turned off a class of comb- finding procedures 
proposed by Padberg and Rinaldi [22| (based on the block decomposition of the 
graph obtained by considering only edges e with 0 < cc* < 1), then the running 
time over our default code would increase to 427.2% of its original value. 

6.2 Three of the Harder TSPLIB Instances 

Two of the harder TSPLIB instances solved by Concorde are the printed-circuit 
board instance pcb3038 and the geographical instance fnl4461. On each of 
them, we have run tests similar to those of Table 2; their results are reported 
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Table 1. 87 instances from the TSPLIB 
NT denotes the number of nodes in the branch-and-cut tree 



name 


NT 


time 


name 


NT 


time 


name 


NT 


time 


burmald 


1 


0.06 


linl05 


1 


0.59 


lin318 


1 


9.74 


ulysseslG 


1 


0.22 


prl07 


1 


1.03 


rd400 


15 


148.42 


grl7 


1 


0.08 


grl20 


1 


2.23 


fl417 


5 


57.75 


gr21 


1 


0.03 


prl24 


1 


3.64 


gr431 


13 


133.29 


ulysses22 


1 


0.53 


bierl27 


1 


1.65 


pr439 


15 


216.75 


gr24 


1 


0.07 


chl30 


1 


2.13 


pcb442 


9 


49.92 


fri26 


1 


0.07 


prl36 


1 


3.97 


d493 


5 


113.32 


bayg29 


1 


0.09 


grl37 


1 


3.42 


att532 


7 


109.52 


bays29 


1 


0.13 


prl44 


1 


2.58 


ali535 


3 


53.14 


dantzig42 


1 


0.23 


chl50 


1 


3.03 


si535 


3 


43.13 


swiss42 


1 


0.13 


kroA150 


1 


5.00 


pa561 


17 


246.82 


att48 


1 


0.56 


kroB150 


1 


4.23 


u574 


1 


23.04 


gr48 


1 


0.67 


prl52 


1 


7.93 


rat575 


25 


363.07 


hk48 


1 


0.17 


ul59 


1 


1.00 


p654 


3 


26.52 


eil51 


1 


0.73 


sil75 


3 


13.09 


d657 


13 


260.37 


berlin52 


1 


0.29 


brgl80 


1 


1.46 


gr666 


3 


49.86 


brazil58 


1 


0.68 


rat 195 


5 


22.23 


u724 


11 


225.44 


st70 


1 


0.50 


dl98 


3 


11.82 


rat783 


1 


37.88 


eil76 


1 


0.30 


kroA200 


1 


6.59 


dsjlOOO 


7 


410.32 


pr76 


1 


1.86 


kroB200 


1 


3.91 


prl002 


1 


34.30 


gr96 


1 


6.71 


gr202 


1 


5.01 


sil032 


1 


25.47 


rat99 


1 


0.95 


ts225 


1 


20.52 


ul060 


21 


571.43 


kroAlOO 


1 


1.00 


tsp225 


1 


15.01 


vml084 


11 


604.78 


kroBlOO 


1 


2.36 


pr226 


1 


4.35 


pcbll73 


19 


468.27 


kroClOO 


1 


0.96 


gr229 


3 


38.61 


rll304 


1 


189.20 


kroDlOO 


1 


1.00 


gil262 


1 


13.06 


nrwl379 


19 


578.42 


kroElOO 


1 


2.44 


pr264 


1 


2.67 


ul432 


3 


223.70 


rdlOO 


1 


0.67 


a280 


3 


5.37 


dl655 


5 


263.03 


eillOl 


1 


0.74 


pr299 


3 


17.49 


pr2392 


1 


116.86 



in Table 3. (For the sake of uniformity, we have started each of the nine runs 
on each of the two instances with the value of the optimal tour as the upper 
bound.) 

As with Table 2, the trends exhibited in Table 3 are hardly surprising, al- 
though the dependence of running time on tmax is not quite as neat for fnl4461 
(note its increase as tmax moves from 0 to 8) and it is even more erratic for 
pcb3038. One striking difference between the easier instances on the one hand 
and pcb3038, fnl4461 on the other hand is an increase in the optimal setting 
of tmax) which is 28 for pcb3038 and 32 for fnl4461. These experimental results 
agree with the intuition that the harder instances are better off with the larger 
values of tmax- Another striking difference is the increased effect of local cuts on 
the overall running time. With pcb3038, local cuts with the default tmax = 16 
reduce the running time to 50.3% of its original value and setting = 28 
reduces it further to 43.8%. For fnl4461, the figures are more impressive: local 
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Table 2. The effect of tmax on the easier TSPLIB instances 



^max 


total time to solve 
the 87 instances in Table 1 


number of instances 
solved without branching 


0 


7424.58 


42 


8 


6679.15 


48 


10 


6624.02 


52 


12 


6248.12 


54 


14 


6133.06 


59 


16 


5900.72 


59 


18 


6394.10 


64 


20 


8818.98 


65 


22 


9591.02 


65 


24 


16519.62 


68 


26 


23285.19 


67 


28 


35798.49 


64 


30 


40732.42 


66 



Table 3. The effect of tmax on two harder TSPLIB instances 



root LP = optimal value of the LP relaxation before the first branching 
NT denotes the number of nodes in the branch-and-cut tree 



pcb3038 
optimal value = 


137694 




fnl4461 

optimal value = 182566 


imax 


root LP 


NT 


time 




^max 


root LP 


NT 


time 


0 


137592.61 


665 


145241.25 




0 


182471.39 


14977 


1990568.46 


8 


137595.10 


659 


162228.02 




8 


182487.75 


11903 


2286217.24 


12 


137613.44 


383 


105026.74 




12 


182506.27 


2011 


297019.61 


16 


137625.28 


271 


73085.72 




16 


182519.87 


1071 


194441.98 


20 


137637.25 


271 


119042.35 




20 


182530.12 


417 


78398.31 


24 


137640.66 


155 


65757.78 




24 


182541.98 


213 


53420.13 


28 


137644.28 


107 


63678.80 




28 


182543.60 


137 


49719.02 


32 


137643.42 


127 


129460.96 




32 


182546.60 


81 


44957.02 


36 


137651.53 


101 


233092.39 




36 


182549.72 


59 


56714.06 



cuts with the default tmax = 16 reduce the running time to 9.8% of its original 
value and setting tmax = 32 reduces it further to 2.3%. 

The hardest TSPLIB instances that we have solved are usal3509 and dl5112. 
Batoukov and Sdrevik |B] refer to our solution of usal3509 as one of “only a few 
heroic examples on successful computation on a network of workstations” and 
we do not propose to contradict them. This instance was solved by running an 
earlier version of Concorde in parallel on a network of 48 workstations, including 
Digital Alphas, Intel Pentium IIs and Pentium Pros, and Sun UltraSparcs; a very 
rough estimate of the total running time is about 4 years on our single Compaq 
XPIOOO. 

Four years of CPU time is an exorbitant figure. We believe that without 
the use of local cuts, this exorbitant figure would grow further to a level which 
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would put solving usal3509 out of our reach even if we had gathered many more 
workstations for the heroic purpose. 

The experimental results presented in Table 4 support this sentiment. Its 
29 rows correspond to 29 runs that tighten up the root LP relaxation. Each 
of these runs starts where the preceding run has stopped and it takes note of 
the gap g between the length of the optimal tour and the optimal value of the 
current LP relaxation. Then it attempts to tighten the relaxation by additional 
cuts and it solves this tightened relaxation; this step gets iterated as long as the 
optimal value of the relaxation keeps increasing by at least a prescribed fixed 
percentage of g. 

We begin with five runs of Def COO, meaning our default code with local cuts 
turned off. The last three of these runs yields only imperceptible improvements 
and they narrow the relative gap to 0.0561%. Then we follow with four runs of 
All COO, meaning Def COO with additional cutting-plane routines (other than 
local cuts) that we found to be a hindrance in solving the easy instances from 
Table 1 but that might help in solving the harder instances. (These techniques 
include our implementation of what we understood from Naddef and Thienel [541 
I55J . the dominos-and-necklaces heuristic for finding violated comb inequalities 
that has been described in Sect. 3 of Applegate et al. j^, and other cutting-plane 
routines.) The last three of these runs yields only imperceptible improvements 
and they narrow the relative gap to 0.0435%. This is the best we can do without 
the use of local cuts. 

Then we bring in local cuts in nineteen runs of the default code with fmax 
progressing through the sequence 8, 10, 12, . . . , 44. As a result, the relative gap 
is narrowed to 0.0164%, less than two fifths of the previous value. Finally, an 
additional run of All COO yields only an imperceptible improvement. 

7 Generalizations 

The cutting-plane method and its descendants are applicable to any problem 
minimize x subject to a: S 5, 

where 5 is a finite subset of some Euclidean space M™, provided that an effi- 
cient algorithm to recognize points of S is available. The corresponding general 
problem of finding cuts is this: 

given (by means of an efficient membership-testing oracle) 
a finite subset S of some and 
given a point x* in IR"* that lies outside the convex hull of S, 
find a vector a and a scalar b such that 

5 C {cc : a^x < 6} and a^x* > b. 

Algorithm I3T1 dealing with the special case case where m = n{n — l)/2 and 
S is the set of the incidence vectors of all the tours through a set of n cities, 
generalizes to Algorithm ITTI (In the special case, d= {k+ l)fc/2, </>(x) = x, and 
T is the set of all tangled tours through V .) 
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Table 4. Local cuts and usal3509 





usal3509 




optimal value = 


= 19982859 


Code 


root LP 


gap 


time 


Def COO 


19967S64.S4 


0.0751% 


2157.08 


+Def COO 


19970751.68 


0.0606% 


2795.47 


+Def COO 


19971645.49 


0.0561% 


1140.06 


+Def COO 


19971656.19 


0.0561% 


165.04 


+Def COO 


19971657.09 


0.0561% 


155.35 


+A11 COO 


19973413.70 


0.0473% 


4293.28 


+A11 COO 


19974175.97 


0.0435% 


9455.69 


+A11 COO 


19974177.19 


0.0435% 


733.44 


+A11 COO 


19974180.02 


0.0435% 


766.91 


+Def COS 


19974585.45 


0.0414% 


2559.30 


+Def CIO 


19975711.27 


0.0358% 


5131.98 


+Def C12 


19976111.11 


0.0338% 


3931.73 


+Def C14 


19976520.40 


0.0317% 


4355.10 


+Def C16 


19976898.82 


0.0298% 


4288.43 


+Def CIS 


19977508.51 


0.0268% 


8964.28 


+Def C20 


19978299.28 


0.0228% 


10901.51 


+Def C22 


19978577.43 


0.0214% 


6126.33 


+Def C24 


19978738.27 


0.0206% 


5478.57 


+Def C26 


19978861.69 


0.0200% 


5556.20 


+Def C2S 


19979011.25 


0.0193% 


10791.34 


+Def C30 


19979133.77 


0.0186% 


6802.63 


+Def C32 


19979135.92 


0.0186% 


1676.74 


+Def C34 


19979382.66 


0.0174% 


32465.95 


+Def C36 


19979384.65 


0.0174% 


7018.06 


+Def C3S 


19979507.22 


0.0168% 


36408.27 


+Def C40 


19979571.48 


0.0165% 


25237.98 


+Def C42 


19979574.20 


0.0164% 


9143.59 


+Def C44 


19979576.01 


0.0164% 


12623.30 


+A11 COO 


19979583.83 


0.0164% 


1316.49 



Algorithm 71. A very general scheme for collecting cuts 
initialize an empty list C of cuts; 

for selected small integers d, linear mappings (p : IR™ — >■ IR'^, and 
finite subsets T of IR'^ such that (p{S) C T 
do if 4>{x*) lies outside the convex hull of T 
then find a vector a and a scalar b such that 

T C {x : a?"x < 6} and a'^4>{x*) > b; 
add the cut <j){x) < 6 to £; 

end 

end 

return £; 
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The trick of trying to separate x* from S by separating 4>{x*) from T was 
used previously by Crowder, Johnson, and Padberg |17] in the context of integer 
linear programming, where S consists of all integer solutions of some explicitly 
recorded system 

Ax = b, £ < X < u (31) 

and X* satisfies m in place of x. Crowder, Johnson, and Padberg consider 
systems dSD such that A is sparse and £ = 0, u = e; for each equation = (3 
in the system Ax = b, they consider the set T of all integer solutions of 

X = 13, 0 < X < e 

restricted on components Xj such that aj yf 0; with (jj^x) standing for the re- 
striction of X on these components, they try to separate x* from S by separating 
<p{x*) from T. In the attempt to separate 4>{x*) from T, they use exclusively in- 
equalities that match certain prescribed templates and they use special-purpose 
separation algorithms to find such cuts. 

Boyd |8I9IJ starts out with the choices of (f> and T made by Crowder, John- 
son, and Padberg, but then he separates ^(a;*) from T by a general-purpose 
procedure. He solves the problem 

maximize z 

subject to z — a^(j){x*) + a^(j){x) < 0 for all x in T, 

||a||i < 7) l|a||oo < 1 

with a prescribed constant 7 by a method that is essentially the simplex method 
(in particular, the value of z increases with each nondegenerate iteration); to 
access T, he uses an oracle implemented by a dynamic programming algorithm. 
If the optimal value z* turns out to be positive, then he returns the cut 

4>{x) < (j){x*) — z*, 

which he calls a Fenchel cutting plane. 

The technique described in Sect. 14. II provides another implementation of the 
body of the for loop in Algorithm [7T] for a fairly general class of sets T : it 
requires only an efficient oracle that, given a vector a in fR"^, returns either an 
X in T that maximizes oAx or the message “infeasible” indicating that T is 
empty. This technique - presented as Algorithmic] in tbe special case where the 
set of all strongly constrained tangled tours is substituted for T - is reviewed as 
Algorithm[72 function Separate, given (by means of an efficient maximization 
oracle) a finite subset T of some IR'^ and given a point x* in K'^, returns either 
the message “ai* is in the convex hull of T” or a vector a and a scalar b such 
that 
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Algorithm 72. SEPARATE(a:*, T) 

if r^0 

then A = any d x 1 matrix whose single column is an element of T ; 
repeat if the linear programming problem 
minimize u^e + v'^e 
subject to —a^A + > 0, 

a^x* — 6=1, 

+ u'^ — v'^ = 0 , 

> 0 

has an optimal solution 
then find an element x oil~ that maximizes 
if a^x < b 
then return a and 6; 
else add a; to A as a new column; 
end 

else return the message 

“T * is in the convex hull of T” ; 

end 

end 

else return 0 and —1; 

end 



One way of solving the linear programming problem in each iteration of the 
repeat loop in Algorithm |72] is to apply the simplex method to its dual with 
relatively few rows, 



maximize s 

subject to sx* — AX + re =0, 

—s + e^X = 0, 

A > 0, —e<w<e, 

just as Concorde does in its implementation of Algorithm EH 



(32) 



Modifications of SEPARATE(a;*, T) used by Concorde in its implementation of 
Algorithm EH may also apply to implementations of Algorithm EH In particular, 
if we have at our disposal a nonempty family T of row vectors w\ such that 

T C {x : v^x > re} and w^x* = w, 

then we may test the condition 



X* lies outside the convex hull of T 



in Algorithm EH by calling Separate(x*, T*) with 



T* = {x G T : v'^x = w for all w] in T}. 
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(In Concorde, T* is the set of all strongly constrained tangled tours through 
V.) The point of this substitution is that Separate(5;*, T*) tends to run faster 
as the dimension of {x £ : v'^x = w for all in T} decreases; if most 

of our choices of d, </>, and T in Algorithm [71] place x* in the convex hull of 
T, then we may save time even if each successful call of Separate(3;*, T*) is 
followed by a call of Separate(x*, T). In fact, if Separate(3;*,T*) returns an 
inequality a*'^x < b* that separates x* from T*, then there is no need to call 
Separate(5;*, T): for every nonnegative M such that 



M > 



a*'^x - b* 

'^{v'^x — w : [v'^, w] € T) 



for all a; in T— T*, 



the inequality 



a*^x — M ■ — w : [v"^ , w] £ T) < b* 

separates x* from T. 

Furthermore, as long as x* is in the affine hull of T, every cut separating x* 
from T may be converted to a cut that induces a facet of the convex hull of T 
by the techniques described in Sect. 14. dl where the set of all moderately tangled 
tours through V plays the role of T. Specifically, Algorithm [731 given any cut 
a'^x < b, returns a cut inducing a facet of the convex hull of T. Here, a default 
I is supplied by the optimal basis of problem (oa, a default C is supplied by all 
w] in such that T {x : v^x = w}, and Tilt (a, 6, v, w, is Algorithm 
133] with T substituted for the set of all moderately tangled tours. 

Algorithm 73. From a cut a^x < b to a facet-inducing cut in general 

I = an affinely independent subset of {x £ T : a'^x = 6}; 

C = a catalog of vectors w] such that T C {x : v'^x > w}, v'^x* > w, 
and v'^x > w for some a; in T; 
xq = an arbitrary element of T ; 
for all [i;^, w] in C 
do remove \v ^ , w] from C 

if I C {x : = re} 

then (a+, 6+,x“'‘) = Tilt (a, 6, x, w, x°); 
a = a+, b — , X = X\J {x’’"}; 

end 

end 

while \X\ < dimT 

do x° = an element oiT — X such that X U {x°} is affinely independent; 
if a^x° = b 

then replace (a,b;X) by (a, 6; I U {x°}); 
else find a nonzero vector v and a number w such that 
v'^x — w = 0 for all X in J U {x°}, 
v^x — w ^ 0 for some x in T; 
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(a+j = Tilt {a,b,v,w,x^); 

{a~ ,b~ ,x~) = Tilt {a,b, —v, —w,x^); 
if a^^x* — 6+ > a“^T* — b~ 
then replace {a,b]2) by (a+,6+;lU {a;+}); 
else replace {a,b]2) by (a“, U {T“}); 
end 

end 

end 

return a and b; 



By the way, Algorithm [72] provides a constructive proof of the following 
corollary of a classic theorem of Minkowski (jSD] Section 19): 

Theorem 1. Let T be a finite subset of some IR'^, let x* be a point in the affine 
hull of T, and let an inequality a'^x < b separate T from x* in the sense that 

T C {x : a^x < b} and a^x* > b. 

Then there is an inequality a^x < (3 that separates T from x* , induees a faeet 
of the eonvex hull ofT, and satisfies 



{x gT '■ a'^x = b} C {x gT : a"^x = fi}. 



Success of Algorithm ED hinges on the ability to make choices of 4> and T 
(which may be guided by x* and 5) in such a way that 

(i) chances of falling outside the convex hull of T are reasonable 

and 

(ii) cuts a^(j){x) < b collected in C are not too weak. 

Our way (described in Sect. [5]) of making these choices in the special case where 
S is the set of all tours through a set V meets both of these criteria. With respect 
to (i), it is adequate: typically, one out of fifty to a hundred of our choices of 
(j) and T makes 4>{x*) fall outside the convex hull of T. With respect to (ii), 
it could hardly be better: there is no known counterexample to the conjecture 
(implicit in Naddef and Rinaldi [^) that 

afif>{x) < b induees a faeet of the eonvex hull of S 

whenever afx < b induces a faeet of the eonvex hull off and a^(x*) > b. 

This may be a major reason behind the success of our application of Algorithm 17 II 
to the traveling salesman problem. 

If one were to use Algorithm (TT) in the context of integer linear programming, 
where S consists of all integer solutions of some explicitly recorded system 

Ax = 6, £ < X < u, 
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then one would have to design a way of making choices of (j) and T. One option is 
to choose each (f) and T by choosing a, dxm integer matrix P, setting (p{x) = Px, 
and letting T consist of all integer vectors x such that some x satisfies 

Ax = b, £ < X < u, Px = X. 

Maximizing a linear function over P amounts to solving a mixed integer linear 
programming problem in d integer and m non-integer variables; as long as d is 
small, this problem can be solved quickly. As long as at least one row of P 
satisfies 

[maxjr^a; : Ax = b, £ < x < m}J < r'^x* , (33) 

4>{x*) falls outside the convex hull of T ; Gomory’s methods mentioned in Sect. |2] 
provide vectors with property (f55D at insignificant computational cost. We 
have not carried out any experiments with this scheme. 
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